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PREFACE 

This  formal  technical  report  entitled  "3-D  Stress 
Wave  Code  for  the  ILLIAC  IV,"  is  submitted  by  Systems, 

Science  and  Software  (S3)  to  the  Advanced  Research  Projects 
Agency  (ARPA)  and  to  the  Defense  Nuclear  Agency  (DNA) . 

The  report  presents  the  results  of  a  continued  effort 
to  develop  a  versatile  numerical  scheme  for  simulating  3-D 
stress  waves  on  the  ILLIAC  IV  computer  system. 

This  work  was  supported  by  the  Advanced  Research 
Projects  Agency  and  was  monitored  by  rhe  Defense  Nuclear 
Agency  under  Contract  No.  DNA  001  -  7 2  -C- 01 54 .  Colonel  David 
C.  Russell  has  been  the  ARPA  Program  Manager  and  Lt .  Colonel 
F.  J.  Leech  has  been  the  DNA  Project  Scientist. 

Dr.  Gerald  A.  Frazier  has  been  the  S3  Project  Manager. 
Much  help  and  advice  has  been  obtained  from  the  User  Support 
Group  of  the  IAC.  The  authors  are  also  grateful  for  many 
valuable  communications  with  other  users  of  the  ILLIAC  IV 
and  the  ARPANET,  particularly  those  with  Terry  Layman  of  the 
CAC. 
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INTRODUCTION 


I . 


Over  the  past  twenty  months,  we  have  formulated, 
developed,  and  implemented  two  numerical  computer  codes  for 
processing  stress  wave  calculations  on  the  ILLIAC  IV  com¬ 
puter.  Our  primary  goal  has  been  to  develop  a  capability 
for  performing  3-D  wave  calculations  with  a  spatial  resolu¬ 
tion  that  is  comparable  to  conventional  2-D  calculations. 

Such  a  computing  capability  would  most  certainly 
prove  to  be  a  valuable  asset  in  numerous  ground  motion 
studies.  The  Lagrangian  stress  wave  code  that  is  being  de¬ 
veloped  on  the  ILLIAC,  referred  to  as  SKIS  (Stress  Waves  Jji 
Solids),  is  expected  to  have  important  applications  for 
simulating  seismic  phenomena  such  as: 


•  explosions  in  prestressed  and  geologically 
complex  formations. 

•  Spontaneous  earthquake  ruptures  and  near-field 
ground  motions. 

•  Stress  waves  passing  through  laterally  varying 
earth  models. 

•  Stress  waves  impinging  on  buried  and  surface 
structures . 


This  3-D  simulation  capability  is  expected  to  play  an  impor¬ 
tant  role  in  the  Advanced  Research  Projects  Agency’s  (ARPA) 
program  to  discriminate  earthquakes  from  explosions  and  the 
Defense  Nuclear  Agency's  (DNA)  investigations  on  the  vul¬ 
nerability  cf  buried  structures  to  incoming  stress  waves. 


In  order  to  simulate  the  seismi 
above  in  three  spatial  dimensions,  the 
for  handling  very  large  grids.  It  is 
the  advent  of  super  computers  such  as 
highly  sophisticated  numerical  computi 


c  phenomena  itemized 
capability  must  exi 
our  opinion  that,  wi 
the  ILLIAC  IV  and 
ng  algorithms,  detai 
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3-D  wave  calculations  are  now  feasible.  Based  on  predicted 
computing  speeds  for  the  ILLI'XC,  we  have  estimated  that  it 
is  theoretically  feasible  to  process  wave  calculations  at 
the  rate  of  0.10  m-sec  of  computer  time  per  numerical  time 
step  per  3-D  element.  This  amounts  to  10  secs  of  computer 
time  per  numerical  time  step  for  a  3-D  grid  containing  100- 
thousand  elements  (e.g.,  a  50  by  50  by  40  grid).  This  com¬ 
puting  rate  is  nearly  comparable  to  that  achieved  in  2-D 
wave  calculations  for  an  equivalent  50  by  50  grid  using  a 
conventional  serial  computer,  e.g.,  a  UNIVAC  1108. 

Numerically  processing  wave  calculations  in  3-D  grids 
with  more  than  100-thousand  elements  involves  an  enormous 
number  of  calculations.  More  than  one  billion  floating  point 
multiply  and  add  operations  can  arise  from  a  single  computer 
simulation.  Clearly,  considerations  of  computing  efficiency 
become  extremely  important  in  designing  and  implementing  such 
a  computing  scheme.  Also,  the  usefulness  of  such  a  code,  once 
it  is  finalized,  will  depend  on  its  flexibility  for  handling 
a  range  of  geometric  configurations,  boundary  conditions, 
material  types,  etc. 

In  an  effort  to  arrive  at  an  optimum  computing  scheme, 
we  have  carefully  reviewed  existing  numerical  computing  tech¬ 
niques,  both  finite  element  and  finite  difference,  in  order 
to  combine  strong  points  from  each  method  into  a  single  algo¬ 
rithm  in  a  form  that  is  suited  for  parallel  processing  on  the 
ILLIAC.  In  so  doing  we  have  conceived  a  hybrid  scheme  that 
employs  the  finite  element  method  for  spatial  discretization 
(employing  spatial  interpolation  functions  and  a  virtual 
work  expression)  and  follows  a  computing  sequence  that 
resembles  Lagrangian  finite  difference  schemes;  the  consti¬ 
tutive  properties  of  the  material  appear  in  an  isolated 
module  of  the  code  so  that  the  nonlinear  flow  rule  can  easily 
be  altered.  In  addition,  the  SWIS  code,  which  employs  the 


6 


hybrid  algorithm,  contains  some  advanced  features  that  will 
enhance  its  usefulness  in  particular  applications: 

•  The  code  operates,  with  no  loss  of  efficiency, 
in  one-,  two-,  or  three-spatial  dimensions. 

c  The  code  contains  a  flexible  grid  generator 
which  can  be  superseded  in  local  portions  of 
the  grid. 

•  The  calculations  are  performed  in  orthogonal 
curvilinear  coordinates. 

•  The  code  has  a  provision  for  effectively  sup¬ 
pressing  wave  reflections  at  grid  boundaries. 

A  detailed  description  of  the  SWIS  computing  scheme  is  pre¬ 
sented  in  Section  II  of  this  report. 

During  the  early  stages  of  code  development,  the 
ILLIAC  IV  was  not  operational.  The  first  successful  program 
execution  on  the  array  took  place  in  March,  1973.  Because 
no  I/O  facilities  were  available  at  this  time,  computed  re¬ 
sults  had  to  be  extracted  from  core  dumps.  A  linearized 
version  of  SWIS  (described  in  Section  3.2  of  this  report) 
first  became  operational  on  the  array  in  July,  1973.  A 
number  of  plane-wave  calculations  were  performed  at  this 
time  to  test  various  features  of  the  linear  SWIS  code,  e.g., 
artificial  damping  and  transmitting  boundary  conditions. 

One  calculation  was  performed  which  involved  10,000  time 
steps  in  an  effort  to  test  the  stability  of  the  current 
ILLIAC  configuration  and  to  obtain  estimates  of  computing 
rates . 

Following  the  initial  success  with  the  linearized 
version  of  SWIS,  ILLIAC  program  development  for  the  more 


involved  nonlinear  SWIS  code  was  begun  in  September.  As 
the  result  of  improvements  in  the  ILLIAC  computing  system 
and  our  earlier  experiences  in  implementation  on  the  sys¬ 
tem,  this  code  development  work  progressed  rapidly.  The 
nonlinear  SWIS  became  operational  for  1-D  geometries  in 
late  November,  and,  during  the  month  of  December,  both  2-D 
and  3-D  wave  calculations  were  performed  on  the  ILLIAC. 

Based  on  the  central  system  clock  times,  we  estimate  that 
the  general  SWIS  code  is  processing  wave  calculations  at 
the  rate  of  1.2  m-sec  of  computer  time  per  numerical  time 
step  per  3-D  element.  The  entire  code  is  programmed  in 
GLYPNIR,  and,  although  nearly  all  of  the  calculations  are 
carried  out  in  parallel,  no  effort  has  been  made  to  opti¬ 
mize  the  resulting  machine  code.  Also,  we  note  that  the 
ILLIAC  should  ultimately  process  calculations  much  faster 
than  in  its  present  configuration;  perhaps  a  factor  of 
three  or  four  will  be  realized  from  soft wT are  (overlapping 
machine  instructions)  and  hardware  improvements  that  are 
being  considered.  Thus,  we  are  anticipating  somewhat  faster 
computing  rates  in  the  future.  Our  goal  has  been  set  at 
0.10  m-sec  of  ILLIAC  time  per  numerical  time  step  per  3-D 


element . 


II.  NUMERICAL  ALGORITHM 


2.1  INTRODUCTION 

The  numerical  algorithm  that  has  been  designed  for 
the  SWIS  code  contains  features  of  both  finite  element  and 
finite  difference  methods.  In  many  respects,  it  is  like  a 
finite  element  method  in  that  the  continuum  is  discretized 
using  spatial  interpolation  functions  and  a  virtual  work 
principle,  but  the  computing  sequence  is  modeled  after 
Lagrangian  finite  difference  shock  codes.  Figure  2.1  illus¬ 
trates  how  three  distinct  steps  in  a  Lagrangian  finite 
difference  code,  one  of  which  involves  the  constitutive 
properties)  are  combined  into  one  step  in  the  conventional 
finite  element  method  through  the  use  of  a  stiffness  matrix. 
The  SWIS  code  does  not  develop  the  finite  element  stiffness 
matrix  but  rather  directly  computes  strain  rate,  stress,  and 
restoring  forces. 

2  .  2  PROBLEM  INITIALIZATION 

The  following  quantities  are  needed  to  pose  the  stress 
wave  calculation: 

1 .  Coordinate  System  Designation 

a.  Number  of  spatial  dimensions  to  appear  in 
the  numerical  grid. 

b.  Orthogonal  curvilinear  coordinate  system  to 
be  employed  in  the  calculations.  Transforma¬ 
tion  metrics  are  provided  internally  for 
operating  in  Cartesian,  spherical,  and  cylind¬ 
rical  coordinate  systems. 
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Initial  Conditions  for 
Advanced  Time  Step: 


Stress  Rate,  Stress 


Cell-centered-stress  Finite  Difference 
Conventional  Finite  Element 


Fig.  2 . 1- -Schematic  of  the  computational  sequence  for 
propagating  stress  waves  using  conventional 
finite  element  and  finite  difference  methods. 
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Grid  Configuration 


Most  grids  can  be  produced  using  a  flexible  code- 
contained  grid  generator;  however,  a  provision  is 
made  for  superseding  the  grid  generator  in  local 
regions.  The  grid  configuration  is  described  by: 

a.  Spatial  location  of  the  node  points,  and 

b.  Node  map  to  associate  nodes  with  elements. 

Boundary  Conditions  and  Applied  Forces 

No  distinction  is  made  between  internal  nodes 
and  boundary  nodes  in  that  each  directional 
component  of  each  node  point  is  assigned  one 
of  the  following  three  constraint  conditions: 

a.  Unconstrained  with  applied  body  force  or 
surface  traction  to  form  an  array  of 
nodal  forces. 

b.  Const  ained  with  nodal  displacement  com¬ 
ponents  constrained  to  follow  a  specified 
time  history  (moving  or  stationary) . 

c.  Transparent  with  a  boundary  disguised  to 
reflect  almost  no  incident  wave  energy. 

Material  Properties 

Each  element  is  associated  with  a  material 
described  by 

a.  Density. 

b.  Constitutive  properties,  i.e.,  properties 
for  developing  stress  rate  as  a  function 
of  strain  rate  and  stress. 

c.  Dimensionless  coefficient  for  regulating 
the  damping  of  spurious  high  frequency 
numerical  oscillations. 


c.  Time  histories  of  individual  node  points. 

d.  Plot  files  producing  graphical  displays 
of  the  computed  results. 

Default  conditions  and  data  generation  schemes  are 
used  where  possible  to  minimize  the  quantity  of  data  that 
is  needed  to  describe  a  problem. 
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2.3  SPATIAL  DISCRETIZATION 


A  spatial  region  is  discretized  by  subdividing  its 
\olume  Y  into  a  total  of  E  elements,  illustrated  in 
Fig.  2.2.  The  displacement  field,  u^(x,t),  throughout  V, 
is  interpolated  from  spatially  sampled  displacements  u.(X  ,t) 
\%heie  Xn,  n  -  1,  2,  ...  N,  are  isolated  node  points  which 
are  positioned  at  juncture  points  along  element  boundaries; 

N'  is  the  total  number  of  node  points  in  the  numerical  grid. 
The  spatial  interpolation  is  achieved  using  piecewise  smooth 
interpolation  functions  Pn(x)  so  that 

N 

“itx.t)  -  £  pnOO  u.(X„,t)  ( 

n  =  l 

in  which 


5 

nm 


Spatial  derivatives  of  the  displacement  field  are  then  ex¬ 
pressed  in  terms  of  nodal  displacement  by  the  appropriate 
differentiation  of  Eq.  (2.1). 


2’5,1  Orthogonal  Curvilinear  Coordinates 


Curvilinear  coordinates  are  often  better  suited  for 
particular  applications  than  Cartesian;  notably,  cylindrical 
and  spherical  coordinates  are  well  suited  for  explosion  cal¬ 
culations.  The  capability  to  operate  in  spherical  (or 
spheroidal)  coordinates  also  has  value  for  applications  in 
globc.l  seismology.  In  order  to  accomodate  applications 
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Fig.  2 . 2- -Three-dimensional  grid  illustrating  coordinate 
systems . 


which  can  benefit  from  the  use  of  a  particular  coordinate 
system,  the  SWIS  code  has  been  developed  to  operate  in  general 
orthogonal  curvilinear  coordinates  designated  y  (we  reserve 
the  notation  x  for  Cartesian  systems).  A  brief  development 
of  how  this  is  accomplished  is  presented  below. 

Consider  the  point  P  with  Cartesian  coordinates  x 
and  curvilinear  coordinates  y.  We  define  a  local  Cartesian 
system,  x',  with  its  origin  centered  at  P.  Each  component 
of  xT  is  measured  in  the  direction  tangent  to  the  respective 
curvilinear  component  y^  at  point  P,  illustrated  in  Fig.  2.3 
The  metric  coefficients  for  the  orthogonal  curvilinear  system 
y  are  given  by 

3x^ 

hi  =  srrrr  (2.3) 

7  (l) 

where  the  brackets  enclosing  the  subscript  indicate  that 
the  summation  convention  does  not  apply  to  that  particular 
subscript.  Table  I  contains  metric  coefficients  and  their 
curvilinear  spatial  derivatives  for  some  of  the  more  common 
orthogonal  curvilinear  coordinate  systems. 

When  a  scalar  field,  e.g.,  density  =  p(y,t),  is 
differentiated,  the  curvature  in  the  y  system  adds  no 
complications,  and  we  simply  have 


9 p  =  9 p  =  1 _  9p_ 

9x j  3xr  ^k  8yj 


(2.4) 


However,  when  a  vector  field,  e.g.,  displacement  =  u^(y,t), 
is  differentiated,  curvature  in  the  y  system  gives  rise  to 
additional  terms  which  are  developed  in  elementary  texts  on 
tensor  calculus,  see  Spain  (1960)  or  Washizu  (1968).  For 
the  special  case  of  orthogonal  curvilinear  systems  we  write 
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TABLE 


(2.5) 


9u . 


3x  ' 
3 


-  D  .  . ,  u, 
ljk  k 


with 


6  . 


D.  . 


ik  3 


!Hlli  +  aii  ahm 


field 


ijk  ho)  (2-6) 

Spatial  derivatives  of  the  discretized  displacement 


N 

E 

n=l 


(2.7) 


are  conveniently  expressed  using  the  operator  D.  .  ^ : 
3u .  N 

~  («•*)  L,  Dijk  p„(y)  uk(yn.t)  . 


3xj 


n  =  l 


(2.8) 


When 

y 

denotes  a  Cartesian 

system  (i.e  .  ,  y 

=  X 

and 

hi  - 

1), 

we  note  that  D. 

ljk 

reduces  to  6., 
ik 

3~ 
3x  . 

and 

Eq . 

(2.8) 

reduces  to  our  earlier  expression, 

Eq9 

(2.2). 

Strain,  in  the  curvilinear,  discretized  space,  is 
also  conveniently  expressed  using  the  spatial  operation 


1  8ui  1  9u; 

eij^>t)  =  7  — 7  (y,t)  +  j  — L  (y ,  t) 
dx.  z  3X:  ~ 


7  (Dijk  +  Djik)uk^>^ 
N 


=  7  H  CDljk  *  Djik)pn(*)VIn>t) 


n=l 


(2.9) 
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Because  the  grid  deforms  with  the  medium  in  a  Lagrangian 
formulation,  we  retain  only  the  first  order  terms. 

2.3.2  Inner-Element  Interpolation 

In  the  SWIS  coce,  as  in  conventional  finite  element 
computer  codes,  spatia  interpolation  is  defined  element  by 
element.  Localized  interpolation  functions,  which  depend 
only  on  the  geometry  of  a  single  element,  characterize  the 
spatial  variations  within  the  element  so  that 

pn«  -  p'(y) 

for  >  in  V  or  on  S  where  Ve  and  Se  denote  the 
volume  and  the  boundary  surface  of  element  e,  respectively. 

The  result  is  that  u.(y,t)  in  Eq.  (2.7)  depends  only  on  the 
displacement  of  the  node  points  bordering  element  e  when 
X  is  interior  to  or  on  the  boundary  of  element  e.  A  brief 
development  of  how  interpolation  functions  are  expressed 
for  skewed  element  geometries  is  presented  below. 

Let  us  consider  a  local  element  transformation  that 
serves  to  map  skewed  elements  into  a  standard  geometry, 
namely,  a  two-unit  line  segment  in  1-D,  a  two-unit  square  in 
2  D,  and  a  two-unit  cube  in  3-D.  We  df  iignate  this  natural 
element  coordinate  system  z,  illustra.ad  for  3-D,  in  Fig.  2.2. 
The  interpolation  functions  for  a  low  order  3-D  element  are  then 
expressed  in  the  natural  system  as 

'fH*  +  z3Z3^ 

m  which  m  =  1,  2,  ...  8  denotes  the  local  node  number  for 
the  element,  illustrated  in  Fig.  2.2,  and  Z.  =  +  1  denotes 
1  coordinate  value  for  node  m.  The  more  general  ex- 
pression  that  applies  in  D-dimensional  space  is  written 
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(2.10) 


P  ( z) 
*m  ~ 


=  ~T5  TT  (1  +  ZiZim) 


Spatial  derivatives  of  the  element  interpolation  functions 
are  obtained  directly  from  Eq.  (2.10)  to  yield 

3P  (Z)  Z  •  (TT 

=-^TT  ' 

J  Z  i  =1 


(2.11) 


Interpolation  functions  for  higher  order  elements  are  easi.y 
developed  in  the  natural  coordinate  geometry,  see  for  example, 
Frazier,  et  al . ,  1973. 

We  note  that  the  interpolation  functions  expressed 
in  natural  element  coordinates  z  can  be  used  directly  for 
interpolating  field  variables  within  an  element,  e.g.  , 


L 


(2.12) 


In  addition,  the  same  interpolation  functions  serve  to 
express  the  mapping  from  natural  element  coordinates  z  to 
the  global  curvilinear  coordinates  y,  i.e., 


(2.13) 


3y ±  Cz) 


9pm ( z) 

in  ~  Y 
3Zj  im 


(2.14) 


Yim  denotes  the  i —  coordinate  value  of  local  node 


where 

number  m.  Spatial  derivatives  of  the  interpolation 

functions  with  respect  to  global  coordinates,  Eqs .  (2.2)  and 
(2.8)  then  become 

-1 


9p 


m 


37i 


9z  . 
J 


9p 


m 


so  that  the  derivative  of  the  interpolated  displacement  field, 

Eq.  (2.12),  with  respect  to  %  is  computed  using  the  expression 

,D 

9pm 

717.1  TzT  Vzm>t)  .  (2.15) 


9u  ■ 


ST- 


(?,t)  = 


sy.; 


-1  2 


m=l 
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2.4  NOTATIONAL  CONVENTION 


Ihe  notation  that  has  been  adopted  for  the  various 
spatial  coordinate  systems  is  illustrated  in  Figs.  2.2  and  2.3: 

x  —  global  Cartesian  coordinates. 

y  ~  global  orthogonal  curvilinear  coordinates. 

x"  -  local  Cartesian  coordinates  aligned  with  the 
curvilinear  system  at  the  point  x"  =  0. 

z  -  natural  element  coordinates. 

Capital  symbols  X^,  Y^,  and  are  used  to  denote  the 

spatial  coordinate  of  a  node  point. 

As  illustrated  by  Eqs.  (2.8)  and  (2.9),  spatial  deriva¬ 
tives  in  the  various  coordinate  systems  merely  involve  mani¬ 
pulations  on  the  element  interpolation  functions.  Subroutines 
handle  these  transformations  in  such  a  way  as  to  minimize 
complications  in  the  primary  logic  of  the  SWIS  code.  At  the 
same  time  care  has  been  taken  to  assure  that  excess  calcula¬ 
tions  do  not  occur  when  employing  a  simple  Cartesian  system. 

In  the  subsequent  presentation,  matrix  notation  will 
be  used  in  preference  to  subscript  notation  for  denoting 
arrays  that  arise  from  spatial  discretization.  This  enables 
us  to  keep  directional  indices  (subscript  notation)  separate 
from  nodal  indices  (matrix  notation).  For  convenience,  all 
matrices  are  of  order  N,  the  total  number  of  node  points  in 
the  grid;  and  the  symbols  <  >,  {  },  [  ],  and  [""]  are  used 
to  denote  a  row,  column,  square,  and  diagonal  matrix,  re¬ 
spectively.  Using  this  notation,  Eq.  (2.7)  becomes 

ui(y,t)  =  <p(y)>  (u.(t)j  (2.16) 

and  the  local  Cartesian  derivative  of  the  spatially  discre¬ 
tized  displacement  field  is  expressed 
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a  U 

37j  =  <Dijkp(y)>  (ukct) }  . 


(2.17) 


Note  that  we  have  used  upper  case  to  denote  nodal  displacement 
(a  nodal  subscript  replacing  the  spatial  argument),  i.e., 


Um^  = 


(2.18) 


{Ui (t) )  =  ui CVn , t)  ,  n  =  1,  2,  ...  N  . 

finally,  with  regard  to  notation,  we  point  out  that 
the  global  arrays  <p(y)>  and  <D.  p(y)>  are  never  actually 

developed  in  the  computing  algorithm,  but  rather  spatial 
interpolations  are  dealt  with  element  by  element.  An  array 

that  is  localized  to  a  single  element  is  denoted  by  a  super¬ 
scripted  e,  thus 


<P(y)>  =■•  ^2  <pe(y)> 


^ijkP^  =  <Dijkpe(y)>  , 


I  being  the  total  number  of  elements  in  the  global  assemblage. 
Using  this  notation,  p®(y)  is  zero  unless  node  n  is  associa¬ 
ted  with  element  e  and  y  lies  within  or  on  the  boundary 
of  element  e. 
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2.5  CONSERVATION  OF  MOMENTUM 


Conservation  of  momentum  in  a  Lagrangian  framework 
can  be  expressed  by  the  virtual  work  expression 


f  /pii.6u.  +  a..  — 7-i-  -  f.6u.\  dV  -  /  t.6u. 
/  \  1  1  11  3x.  11/  J  ii 

*v  v  3  7  c 


dS  =  0 


(2.19) 


in  which  u^  is  the  particle  displacement,  <$u^  is  a  virtual 
displacement,  i7  is  particle  acceleration,  is  stress, 

P  is  mass  density,  and  f^  and  7^  are  specified  body  force 
and  surface  traction,  respectively.  Sa  is  that  portion  of 
surface  (internal  or  external)  bounding  the  volume  V  to 
which  tractions  are  applied.  For  general  orthogonal  curvi¬ 
linear  coordinates,  the  term  36u./3x(  is  taken  to  be 
D...6U,  ,  the  operator  D...  having  been  defined  above  in 

1  J  1^  1C  1  J 

Eq.  (2.6).  For  a  Cartesian  system,  the  term  simply  becomes 

3  6u . / 3x . . 
i  1 

In  the  spatially  discrete  system  conservation  of 
momentum  is  expressed  by  substituting  the  interpolated  dis¬ 
placement  field  from  Eq.  (2.16)  into  the  virtual  work  expres¬ 
sion  above  to  obtain 


{6Ui)1([M]{Ui)  +  (Ri)  +  (Qi)  -  {F.})  =  0 


(2.20) 


where 


^  ^  J  p<pe>T  <p8>  dV 


e  =  l  V 


tFi}  ■ 


z  L  ?i<pe>T<iv  ♦  E  L  v* 


<pb>T  dS 


e  =  l  V 
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^  /  ajk<Djki 


pe>T  dV 


e  =  l  T 


2  /  qi j  <Dj ki 


pe>T  dV 


e=  1  Vs 


in  which  E  is  the  total  number  of  elements  in  the  grid  and 

B  is  the  number  of  element  surfaces  with  applied  tractions. 

An  artificial  stress  q..  =  q..(e)  has  been  introduced  for 

ij  hj  : 

the  purpose  of  damping  spurious  high  frequency  numerical 
oscillations . 

The  zero  matrix  is  the  only  vector  orthogonal  to  all 
possible  (unconstrained)  virtual  displacements  {6U^},  there¬ 
fore  Eq.  (2.19)  yields  a  series  of  simultaneous  equations  ex¬ 
pressing  conservation  of  momentum  node  by  node: 


[M]{U.(t)}  +  (R.(t)}  +  {Qi (t) }  =  { F i ( t ) } 


(2.21 


In  contrast  to  conventional  finite  difference  methods,  free 
surfaces  and  loaded  surfaces  are  "automatically"  provided  for 
in  the  above  equations  of  motion  through  the  forcing  term 
(F^(t)}.  Node  points  with  a  specified  displacement  time 
history  and  node  points  along  a  transmitting  boundary  are  not 
automatically  handled  by  the  virtual  work  expression,  and 
therefore,  these  constrained  node  points  require  special 
considerations.  For  convenience,  we  have  simply  modified  the 
definition  of  the  forcing  term  at  the  constrained  node  points  so 
that  Eq.  (2.21)  applies,  without  exception,  to  all  points  in 
the  grid.  The  modified  prescription  for  F^n(t)  suited  for 
the  constraint  condition  at  node  n  is  given  below  in  Eqs . 
(2.30)  and  (2.31) . 
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In  contrast  to  the  conventional  finite  element  method, 
we  note  that  the  constitutive  properties  have  not  been  ap¬ 
plied  in  the  development  of  Eq.  (2.21).  The  load-deformation 
properties  of  the  material  are  introduced  below  at  an  inter¬ 
mediate  stage  in  the  time  stepping  scheme. 

2.6  TIME  STEPPING  SCHEME 

In  selecting  a  time  stepping  scheme,  we  have  con¬ 
sidered  the  relative  merits  of  explicit  and  implicit  methods. 
In  order  to  achieve  satisfactory  accuracy  in  the  propagation 
of  sharp  wave  fronts  (wave  length  equal  to  10  to  15  grid 
dimensions),  a  small  time  step  is  needed,  roughly  equal  to 
that  required  for  stability  of  an  explicit  method.  When  a 
numerical  calculation  involves  such  small  time  steps,  ex¬ 
plicit  methods  are  strongly  favored  over  implicit  methods. 
Explicit  methods  generally  require  many  times  fewer  opera¬ 
tions  per  time  step.  The  number  of  multiply  and  add  opera¬ 
tions  for  the  implicit  schemes  used  in  finite  element  codes, 
such  as  SAP,  increases  as  N5^  for  cube-like  blocks  of  3-D 
meida;  whereas  for  explicit  methods  the  number  of  operations 
increases  linearly  with  N,  N  being  the  total  number  of 
node  points  in  the  numerical  grid.  Furthermore,  algorithms 
based  on  explicit  methods  are  simpler  to  program  and  general¬ 
ly  more  flexible  for  introducing  nonlinear  material  response 
behaviors.  The  simplicity  of  our  explicit  wave  calculation 
scheme  has  made  it  possible  to  develop  a  very  efficient 
parallel  algorithm  for  processing  linear  seismic  waves  on 
the  ILLIAC  IV  computer. 


Stress,  (y  ,  t-At)  ,  e  =  1,2,  ...  E; 
l^ift  -  ;  displacement,  {Ui  (t )  } ;  and  node 
(Y^(t)},  are  advanced  in  time  by  At  using  a 


velocity , 
positions 
four  stage 


} 


computing  sequence. 
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rmm#  i- 1  vw«  v^v?  '“fi,”?.™  w  r 


Stage  1:  Strain  Rate 

Compute  strain  rate  for  element 

3u . 


•  e  /  At  \  1  i/vt  At  \  +  1  __j  /  t  _  At\ 

^(y-1  ‘  r) '  2  itjU-1  •  ~)  i  2  J 

■  i  <DljkPeCy5  -  DjikPe(y)>  |6k(*  -  f)! 


(2.22) 


where  ef.fy.t  -  4^1  is  the  strain  rate  evaluated  at  a  dis- 

ij  \~  2  /  _  .  e 

crete  point  within  element  e  (strategic  points  in  v  lor 

evaluating  the  {R. }  integral  of  Eq.  (2.20)  which  may,  under 

1  .  _ 0 

special  conditions,  be  confined  to  the  centroid  point  y  ). 

That  is,  the  terms  <Di;jkpe(y)>  are  evaluated  at  the  inte¬ 
gration  points  for  element  e. 

Stage  2:  Stress  Rate  and  Stress 

Compute  stress  rate  for  element  e 

-  f)  ■  £(r-  e) 

which,  for  linear  isotropic  material,  becomes 

°ij(^>t:  '  f1)  =  2yC  '  T-)  +  X*  6ij  tkk(~,t  '  ¥) 


(2.23) 


2 

(2.24) 


where  ye  and  Xe  are  Lame's  elastic  constants  for  the 
material  in  element  e.  The  stress  at  the  advanced  time 
is  then  computed  by  direct  integration 

o®j(y,t)  =  t_At)  +  At  °ij (y>z  _  t~ )  •  (2.25) 

Compute  artificial  stress  q^j  which  serves  to  damp 
spurious  high  frequency  numerical  oscillations 
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(y,t)  =  At  3ea^  ^ 


(2.26) 


where  3e  ~  0.15  is  a  dimensionless  damping  coefficient.  For 
the  special  case  of  linear  waves  (linear  material  and  small 
displacements)  the  damping  provided  by  the  above  expression 
for  q^j >  with  3  uniform  between  elements,  results  in  the 
damping  of  each  natural  mode  of  vibration  (i.e.,  each  eigen¬ 
function  of  the  linear  system)  as  the  square  of  the  corres¬ 
ponding  natural  frequency  (Frazier,  e_t  al_. ,  1973).  Also,  we 
note  that  no  damping  occurs  in  regions  of  stationary  stress. 
The  isotropic  component,  q^,  is  equivalent  to  linear  damp¬ 
ing  used  in  Lagrangian  finite  difference  shock  codes,  see 
for  example,  Richtmyer  and  Morton,  1967. 

Stage  3:  Restoring  Forces  (equivalent  to  stress  gradients) 

Compute  the  nodal  restoring  forces  that  result  from 
stresses  in  element  e 


(R-(t)  +  Q?(t)} 


f  <Djkip(y)>1 


qJkCy,t))  dv  . 


(2.27) 


In  many  applications,  a  single  integration  point  at  the  centroid 
of  the  element  is  sufficient  for  evaluating  the  integral.  How¬ 
ever,  for  cases  in  which  the  strain  energy  that  is  neglected  by 
sampling  only  at  the  centroid  becomes  significant,  stress  is 
computed  (Stages  1  and  2)  at  two  points  for  each  spatial  dimen¬ 


sion  of  the  element,  and  in  this  way  the  spatial  variations 
within  an  element  are  treated  in  the  integration  above.  Further 
discussion  on  the  treatment  of  inner  element  variations  in 
stress  is  presented  in  Appendix  A.  As  an  incidental  note, 
Frazier,  e_t  a_l.  ,  1973  ,  have  shown  that  for  rectilinear  3-D 
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grids ,  a  one-point  integration  procedure  is  equivalent  to  the 
ce 1 1  -  centered- st res s  finite  difference  method  that  is  commonly 
used  in  Lagrangian  shock  codes. 

The  integration  above  is  performed  for  each  element  to 
obtain  the  restoring  force  of  the  medium  on  all  the  nodes  in 
the  grid 

E 

{Ri  Ct)  +  Qi  (t)  }  =  ^  {R®(t)  +  Q®  (t)  }  .  (2.28) 

e=l 

Actually,  the  global  arrays  { R± (t) }  and  (Q.(t)}  are  not 
stored,  but  rather  the  effects  of  the  element  restoring  forces 
are  directly  accumulated  in  the  nodal  acceleration  calculations 
developed  in  Stage  4. 

Stage  4;  Motion  of  the  Node  Points 

Nodal  accelerations  are  computed  directly  from  Eq.  (2.21) 

^ (t) }  =  [M]  ( F i ( t )  }  -  [M] " 1 ( Ri ( t )  +  Qi  (t ) >  (2.29) 

where  the  so-called  lumped  mass  matrix  [M]  is  obtained  by 
replacing  each  diagonal  term  in  the  distributed  mass  matrix 
[M]  by  the  sum  of  the  terms  in  the  row  in  which  it  appears. 

This  operation  yields  a  diagonal  mass  matrix  thereby  making 
the  inversion  of  the  mass  matrix  in  Eq.  (2.29)  trivial. 

Equation  (2.29)  directly  applies  to  all  unconstrained 

internal  and  boundary  nodes,  including  internal  nodes  with 

applied  body  forces  and  boundary  nodes  with  applied  tractions 

(zero  or  otherwise).  A  modified  prescription  for  F.  (t'l 

in 

is  used  at  constrained  node  points  so  that  Eq.  (2.29)  is  uni¬ 
versally  applicable  to  all  node  points  in  the  grid. 
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The  SWIS  code  accommodates  two  types  of  constraint 
conditions:  The  first  type  involves  a  constrained  component 
of  displacement  in  which  displacement  is  made  to  follow  a 
prescribed  time  history  ^(Y  »t).  This  condition  is 


satisfied  with 


M 


Fin(t>  *  Rin^  *  ^in^)  +  ^  V*.,'1  *  4t> 


M 

At 


nn 

2 


uinM  -  sr1  “in(t  -  f) 


(2.30) 


where  n  denotes  the  node  number. 


The  second  type  of  node  constraint  involves  a  trans¬ 
parent  boundary  condition  in  which  a  boundary  point  is  made 
to  reflect  almost  no  energy.  In  this  case  the  nodal  forcing 
term  is  set  to 


Fin^  *  7  *!„<*>  *  |Qin(t)  '  ^fOia(t  -  f) 


(2.31) 


Nodal  velocities  at  the  advanced  time  t  +  At/2  are 
computed  by  direct  (numerical)  integration  of  the  nodal  ac¬ 
celerations 


T  T 


In  rt-~) I 


+  At  U.(t)< 


i  i\  2/|  •  | 

and  similarly,  the  nodal  displacements  are  advanced  in  time 


(2.32) 


(Ui(t  +  At)}  =  (U 


it'))  *  “I'M*  *  f1)! 


(2.33) 


and 


{ Y. 


±  Ct  ♦  At))  =  {Yi  (t)  }  *  it[H(i)]-‘  jui(t  ♦  |E-)j 


(2.34) 
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where  [H^.  is  a  diagonal  listing  of  the  i —  metric  co¬ 
efficient  for  each  node  in  the  grid.  Equations  (2.32),  (2.33), 
and  (2.34)  are  employed  for  all  node  points  in  the  grid;  no 
exception  is  made  at  this  point  for  the  calculations  at  con¬ 
strained  nodes. 

The  sequence  of  calculations  outlined  in  the  four 
stages  above  yields  the  necessary  variables  for  continuing 
into  the  subsequent  time  step,  i.e.,  set  t  =  t  +  At  and  re¬ 
turn  to  Stage  1.  Approximately  500  floating  point  multiply 
and  add  operations  are  required  per  3-D  element  to  advance 
the  solution  one  time  step  using  Cartesian  coordinates  and 
a  one-point  integration  scheme  in  Stage  3.  To  put  this 
number  is  perspective  we  note  that  roughly  one-half  of  this 
effort  would  be  required  per  node  to  multiply  the  non-zero 
terms  in  a  3-D  finite  element  stiffness  matrix  by  the  nodal 
displacements  (about  250  floating  point  multiply  and  add 
operations).  Thus,  the  algorithm  developed  above  should  be 
exceedingly  fast  for  both  linear  and  nonlinear  stress  wave 
calculations.  This  conjecture  is  supported  by  ILLIAC  test 
calculations,  some  of  which  are  presented  in  Section  IV. 
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2.7  CONSERVATION  OF  ENERGY 

Kinetic  energy,  strain  energy,  dissipated  energy 
(artificial  viscosity),  and  load  potential  are  computed  at 
each  time  step  based  on  the  expressions: 

K(t)  =  kinetic  energy  at  time  t 

■  t|ui  (*  *  f)|T  •  f1),1 

S(t)  =  strain  energy  (internal  energy)  at  time  t 


(2.35) 


=  S(t-At)  +  At  S  (t  -  ~j 

=  S  (t- At)  +  f^ju.^t  -  |r±  (t)  +  Ri(t-At)J  (2.36) 

D(t)  =  dissipated  energy  at  time  t  resulting  from 
artificial  viscosity 

=  D(t-At)  +  At  D^t  - 

T 

=  D(t-At)  +  |^ju.(t  -  ^)|  |  Q  i  ( t )  +  Qi  (t- At)  | 

=  D(t-At)  +  AtjO.(t  -  |^)jT  |  Qi  (t )  | 

L(t)  =  load  potential  (energy  entering  or  leaving 

the  system  through  the  boundaries  or  through 
the  action  of  body  forces)  at  time  t  result¬ 
ing  from  body  forces  and  surface  tractions 
(Eq.  (2.20)),  specified  displacement  time 
history  (Eq.  (2.30)),  and  transmitting  bound¬ 
aries  (Eq.  (2.31)) 


(2.37) 
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=  L(t-At)  +  At  l(t  -  |lj 

L(t-At)  +  f^J^t  "  f^)j  jFiCt)  +  Fi(t-At)j  (2.38) 

lve  note  that  an  approximate  expression  is  employed  for  estimat¬ 
ing  dissipated  energy  D(t)  in  Hq.  (2.37),  because  the  array  of 
dissipation  forces  (Q.(t-At)}  -  BAt  >R.  (t  -  l|l)l  ,  which  is 
needed  for  a  consistent  calculation,  is  not  retained  in  core 
at  the  advanced  time  t. 

When  SWIS  is  operated  without  artificial  damping, 
energy  is  conserved  in  the  calculations,  i.e., 


L(t)  -  K(t)  -  S(t)  =  0  +  computer  round-off 


(2.39) 


with  D(t)  -  0.  Conservation  of  energy  has  been  observed  for 
linear  wave  calculations.  However,  because  the  method  has 
been  formulated  from  an  energy  principle,  Eq.  (2.39)  will  also 
be  satisfied  for  waves  in  nonlinear  materials  with  D(t)  =  0. 

Energy  conservation  can  be  demonstrated  directly  from 
the  discrete  equations  of  motion.  The  two  discrete  equa¬ 
tions  -  Eq.  (2.21),  centered  at  time  t,  and  Eq.  (2.21), 

centered  at  time  t  -  At  -  are  averaged  to  yield  an  equation 
of  motion  at  t  -  At/2 


2At  [M]J°i(t  +  T-)  -  °i(t  -  ^ir)[  +  ||Ri(t)  +  R±  (t-At)| 


+  ljQi(t)  +  Q±  (t-At)j  =  fjF.(t)  +  F.(t-At)j 


in  which  Eq.  (2.32)  has  been  employed  to  replace  (U . ( t) }  and 
{U± (t-At) }  by  1 


iJfl  (t  +  41 

At  )  i  \Z  2 


)  -  Mi  -  T-)\ 
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and 


respectively.  This  expression  is  then  pre-multipl 
velocities  {Opt  -  ht/2))T  to  obtain  variations  in 
the  time  step  At 


ied  by  nodal 
energy  over 


Using  the 
have 

K(t)  -  K 


T 

PH  i 


+  f)j  -  tM1  -  HMM*  ■  ^)| 


'  -  TL)JTjRi(t)  *  Ri(t-At)} 

*  fjM*  -  r)jT[«i(t>  *  lit*-*15! 

■  -  £)|>i<”  +  Fi(t-at)| 


notation  introduced  in  Eqs .  (2.35)  through  (2.38)  we 


(t-At)  +  S(t)  -  S (t -At)  +  D(t)  -  D(t-At)  =  L  (t)  -  L(t-At) 

(2.40) 


Consequently,  all  of  the  energy  in  the  system  has  been  accounted 
for;  changes  in  load  potential  are  reflected  by  changes  in 
kinetic  energy,  strain  energy  (internal  energy),  and  dissipated 
energy.  Beginning  at  some  initial  time  t q  with  K(0)  +  S(0) 

+  d (0)  =  L (0)  ,  Eq.  (2.40)  i  applied  repetitively  from  tQ  to 

t  to  yield 

K(t)  +  S(t)  +  D(t)  =  L  (t)  (2-' 
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with  no  dependence  on  the  constitutive  properties.  We  note 
that,  for  the  case  of  a  yielding  material,  not  all  of  the 
strain  energy  is  recoverable.  Thus  we  see  that  S(t)  con¬ 
tains  both  the  recoverable  strain  energy  and  the  internal 
plastic  work. 

2.8  SPECIAL  CASES:  CARTESIAN  COORDINATES.  RECTILINEAR  GRIDS. 

MB  LINEAR  MATERIALS - ^ 1 

The  preceding  formulation  for  time  stepping  nonlinear 
stress  waves  through  1-D,  2-D,  and  3-D  curvilinear  geometries 
may  involve  procedures  not  commonly  found  either  in  finite 
element  literature  or  finite  difference  literature.  In  an 
effort  to  make  the  presentation  more  easily  understood,  some 
special  cases  will  be  considered. 


2.8.1  Cartesian  Coordinates 


The  presentation  of  the  computing  scheme  is  complicated 
somewhat  by  the  inclusion  of  general  orthogonal  curvilinear 
geometries.  Therefore,  we  will  summarize  the  time  stepping 
procedure  using  Cartesian  coordinates.  This  not  only  removes 
much  of  the  abstractness  from  spatial  derivatives  in  the 
discrete  system  but  also  enables  us  to  focus  on  the  key  opera¬ 
tions  that  are  involved  in  completing  a  time  step. 

As  described  above  in  Section  2.6,  stresses,  velocities, 
displacements,  and  node  positions  are  advanced  in  time  using 
a  four  stage  computing  sequence.  Denoting  parameter  initiali¬ 
zation  as  stage  0,  the  following  operations  are  performed: 


0. 


Initialize  Values 


xi^j  ;  j  ui  Ct  D  j  ;  |0.(t  -  f^)|;  a?^(x,  t  -  At)  , 


e  =  1,2,  ...  E;  e  =  1. 


l 


! 


I 

| 
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Compute  Strain 

Rate 

-  7  < 

,3p6(x) 

v  3x . 

J 

♦  7< 

,3pe(x) 

"  3x . 

(2.4; 


Compute  Stress  Rate  and  Stress 

t  -  -  f (*e,  °e)  --  I 


r)  ■  f(se-  f)  =  [^eif3  (;.  t  -  ft) 

*  *Vk(;.  *  -  f) 


,  t  - 


o  .  (x,t) 

1J  -V 


°ij (£»  t_  At)  +  At 

B#“  ^(x.  t  -  ft) 


,  t  - 


q i j  Cx,t)  =  e  At  ai^x,  t 
Compute  Restoring  Forces 


jR-(t)  *  Q-(t)J  « 


ve  3  ' 


+  qf -j  Cx ,  t:)^ 


(2.4: 

(2.4^ 

(2.4! 


(2.4f 


Steps  1  and  2  are  repeated  for  each  integration  point  x 
located  in  element  e.  Steps  1,  2,  and  3  are  repeated  lor 
each  element  in  the  grid  to  yield 


Lj 

k  (t)  +  Q,  (t)|  =  V  ! Rf  (t)  +  Of  (t)j 


4. 


Compute  Motion 


i.  (t)j  =  [M] " 1  jFi (t)  -  (t)  -  QiOOj 


(2.47) 


with 


■  t  f  Ti  »n  dV  +  t/ji  P"‘ 

J  p  ^-l  sD 


Fin(t)  =  Je 

e-1  Ve 


for  the  case  in  which  Xn  represent  an  unconstrained  in¬ 
ternal  node  or  a  boundary  node  with  applied  tractions  (zero 

or  otherwise)  , 


Fin(t)  *  *  Qin(t)  +  Ui(*n>  1  +  At) 


^  u.  (t)  -  ^2-  0 
At’  in  At 


(*  '  r) 


for  the  case  in  which  is  driven  by  the  specified  time 

history  u^(Xn>  t) ,  or 

Fin(t:)  =  I  Rin(t)  +  I  Qin(t)  '  TSt  ^in^  "  2~) 

for  the  case  in  which  Xn  is  positioned  along  a  transmitting 
boundary . 

(a  L  .  At\)  =  (ft  (t  .  At\)  +  AtSu.(t)!  (2.48) 


lMt  +  ^)l  -  W(*  -  ^)1 

jujft  ♦  At)  |  -  jut(t)j  -  Atjo^t  +  f^j 

jxj(t  +  At)  |  =  {Xi(t)j  ♦  Atjuj(t  +  “)J  • 


(2.49) 

(2.50) 


Time  is  advanced  by  At,  e  is  set  to  one,  and  program  control 
is  returned  to  stage  1. 
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2.8.2  Rectilinear  Grids 


In  general,  the  grid  geometry  will  not  be  rectilinear 
As  discussed  in  Section  2.2.2  and  illustrated  in  Fig.  2.2, 
elements  that  appear  skewed  in  the  global  coordinate  system 
y  are  mapped  into  local  element  coordinates  z  where  each 
element  appears  as  a  cube  (or  a  square  in  2-D) .  Simple  poly 
nomial  functions,  Eq.  (2.10),  are  then  employed  for  inter¬ 
polating  spatial  quantities  over  the  interior  of  the  element 
Spatial  derivatives  with  respect  to  the  local  element  coor¬ 
dinate  system  are  expressed  in  terms  of  derivatives  of  the 
interpolation  functions,  Eq.  (2.11).  Because  the  polynomial 
interpolation  functions  also  express  the  transformation  from 
local  element  coordinates  to  global  problem  coordinates, 
spatial  derivatives  of  the  interpolated  field  variables  are 
evaluated  at  a  specified  point  (-1  <  z.  <  +1)  in  an  element 
through  simple  manipulations  on  the  interpolation-function 
derivatives,  Eq.  (2.15). 

The  interpolation  functions  and  their  derivatives, 
expressed  in  local  element  coordinates,  are  the  same  for 
all  elements  in  the  grid.  A  subroutine  has  been  constructed 
for  producing  the  values  of  these  spatial  functions  at  any 
specified  point  T  <  zi  <  +1.  To  proceed  from  these  values 
and  compute  the  value  of  a  discretized  field  variable  or 
its  derivative  at  the  specified  point  in  the  element  merely 
involves  a  few  matrix  op. rations;  consequently,  the  spatial 
differencing  that  is  involved  in  the  time  stepping  scheme, 
Eqs.  (2.22)  and  (2.27),  has  not  been  expressed  explicitly  in 
the  development.  We  do  not  concern  ourselves  with  these  de¬ 
tails  m  actual  code  development.  However,  we  will  develop 
the  particular  spatial  schemes  that  arise  in  3-D  rectilinear 
grids  to  indicate  nature  of  the  spatial  discretization. 
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Foi  the  special  case  of  a  Cartesian  global  coordinate 
system,  spatial  interpolation  in  a  skewed-brick  3-D  element 
is  expressed  by 


ii (z » t)  =  p®cz)  ue  (t) 

1  m  ~  im  1 


*  F  U+z,) (l+z2) fl+z3)uj1 

*  f 

*  ■  ■■  f  u-^m-zpu-jpu*, 

where  the  second  subscript  on  the  nodal  displacements  denotes 
a  local  node  number  as  designated  in  Fig.  2.2.  The  coordinate 
mapping  from  local  element  coordinates  to  global  coordinates, 
i.e.,  =  x.(z),  is  obtained  by  replacing  ui(z,t)  and 

UimCt)  in  the  above  expression  by  xi(z)  and  X?m,  respectively 

When  we  restrict  the  3-D  element  to  a  rectilinear  brick 
geometry  in  which  x^  is  parallel  to  z^,  the  coordinate 
mapping  reduces  to 


x .  =  z .  i  Axe  +  7^ 
i  i  2  A  m 


(2.51) 
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where 


*!  -  r  E  xi. 

m=l 

Ax®  =  (xe  ,Xe  ,Xe  ,  or  Xe  )  -  (xe  ,Xe  ,Xe  ,  or  Xe  ) 

1  'll  13  IS  17/  '121416  18/ 

Ax®  =  (x®  , X®  , X®  ,  or  X®  )  -  (x®  , X®  ,X®  ,  or  X®  ) 

2  V  21  22  25  26/  \  23*  24*  27*  28/ 

Ax®  =  (x®  , X®  , X®  ,  or  X®  )  -  (x®  , X®  ,X®  ,  or  X®  ) 

3  '31  32  33  34/  \  35  36  37  38/ 

That  is,  X?  is  the  centroidal  point  and  Ax®^  is  the 
element  dimension  in  the  direction  x^.  The  transformatioi 
Jacobian,  given  by  Eq.  (2.14),  then  becomes 


3x .  . 

=  i  Ax®. ,  6 .  . 
2  (l)  ij 


(2.52) 


3z  -  / 


— t —  6.  . 
Ax®  1J 

AX(i) 


for  the  case  of  a  simple  brick  element  geometry. 

We  combine  the  spatial  derivatives  of  the  element 
interpolation  functions  with  the  transformation  Jacobian 
above  as  indicated  in  Eq .  (2.15)  to  obtain  the  spatial 
derivatives  of  the  discretized  displacement  field 


(2.53) 


3x.  -1 
3z, 


Z  til  «i(z..t)  • 
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In  particular  we  obtain 


3u . 
i 

1 

CL> 

X 

>-* 

4  Ax6 
i 

+ 

(1-z 

2 

+ 

(1  +  Z 

2 

+ 

(1-z 

2 

3  ti . 
l 

1 

3x 

2 

4  Axe 

2 

+ 

(1-z 

1 

+ 

(1  +  Z 

1 

+ 

(i-z  : 

i 

3u . 

l 

i 

(ltzPti^,)Kru?2) 


i3~ui4' 


(l  +  z^OO  (U-j-U-j) 


3x  Ah  e 
3  4Ax 

3 


Ci-z^  (uf1-ufs) 


Ci-zi3  ci*z23 (ufz-uf6) 

(l-z.) (l-z;) (Uj4  -  U?8) 
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(2.54) 


At  the  element  center,  z  =  0,  we  obtain  the  familiar 
difference  equation 


Kl  *  Ui3  *  UiS  *  Ui  7  - 


-  U?4  -  U?6 


•  U?8) 


(2.55) 


Similar  expressions  are  obtained  for  (du^/dx  )  and 
(3ui/3x_!)  at  the  centroid  of  the  brick-shaped  element. 

2.8.3  Linear  Materials 

An  important  class  of  problems  that  involve  small 
amplitude  seismic  waves  can  be  treated  using  a  linearly 
elastic  material  model.  We  express  linear  material  behavior 
in  the  stress -strain  relationship 

Qij  =  Cijk£ek2, 

feu  the  general  case  of  anisotropic  and  heterogeneous 
media.  With  this  restriction,  Stages  1,  2  and  3  (Eqs.  (2.22) 
through  (2.28))  combine  to  yield 

{Ri(t)}  =  [Ki  j  )  (Uj  (t)  }  (2.56) 


CQA  (t) >  =  BAt[Kij]{U.(t)} 


(2.57) 


where 


[Kij]  =  IE  J  <D  k£iPe>  Ck£mn  <  DmnjO  dV 


e=l  V 


(2.58) 
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in  which  the  indices  i,  j,  k,  £,  m,  and  n  vary  from  one 
through  the  number  of  spatial  dimensions.  By  restricting  the 
development  to  Cartesian  coordinates,  the  expression  for  the 
stiffness  matrix  can  be  reduced  somewhat 


[Kip 


e  =  l  V 


3x, 


ikjJl 


ini 

3xi, 


dV 


(2.59) 


The  equation  of  motion  for  the  discrete  system, 
Eq.  (2.21),  then  becomes 


[M]  (U 


U±(t)  }  +  BAt[Kij]|ui  (t  -  [Ki:j  ]  (U-(t) }  =  (F.(t)} 


(2.60) 

and  the  resulting  time  stepping  algorithm  (Stages  1,  2,  3 
and  4)  is  expressed  in  a  single  equation, 

(Ui(t+At)}  =  At2 [M]"1{Fi(t)}  +  2{U± (t) } 

-{Ui(t-At)>  -  At2[M]'1[Kij]{(l+B)Uj  (t) 


-  BUj (t -  At) } 


The  linear  SWIS  code,  which  is  described  in  Section  3.2, 
is  based  on  this  linearized  time  stepping  procedure. 


(2.61) 


III. 


ILLIAC  CODES:  DESIGN  AND  DEVELOPMENT 


3.1  OVERVIEW 


In  order  to  implement  a  numerical  code  on  the  ILLIAC  IV, 
two  unique  aspects  of  the  machine  require  special  consideration. 
The  first  is  its  parallel  architecture.  In  designing  a  stress 
wave  code,  our  approach  has  been  to  formulate  numerical  opera¬ 
tions  particularly  suited  to  the  parallel  nature  of  the  machine 
rather  than  to  adapt  an  existing  code  with  numerics  designed 
for  a  conventional  serial  computer.  The  resulting  algorithm 
pioved  relatively  easy  to  write  and  debug  on  the  ILLIAC. 


The  second  aspect  to  be  dealt  with  is  the  method  of  in¬ 
put  and  output.  As  the  normal  access  is  via  the  ARPANET  over 
large  geographical  distances,  we  must  use  new  procedures  in 
program  development  and  debugging.  Program  source  and  input 
data  must  be  delivered  to  the  machine  by  teletype  or  by  file 
transfer  from  another  host  computer  on  the  Net.  Output  must 
return  again  via  teletype  or  by  file  transfer  to  a  host  with 
line  printers.  Even  with  the  availability  of  printer  output, 
interpreting  the  results  from  large  3-D  numerical  simulations, 
for  example,  can  be  exceedingly  tedious  without  facilities 
for  graphical  display. 


Our  first  attempt  at  running  programs  in  this  environ¬ 
ment  occurred  in  March  1973.  At  that  time,  the  only  services 
available  at  the  ILLIAC  IV  host  were  text  editing  and  a 
minimal  mechanism  for  ILLIAC  program  submission.  It  took 
nearly  one  month  to  get  results  from  our  first  run.  The  re¬ 
sults  were  contained  in  a  memory  dump  listing  which  was  re¬ 
ceived  by  ma i 1 . 


Since  then,  the  capabilities  of  the  ILLIAC  system  have 
evolved  rapidly.  The  ARPANET  File  Transfer  Protocol  allows 
us  to  transfer  programs  from  UCSD  to  Sunnyvale  and  obtain 
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line  printer  listings  of  ILLIAC  dump  files.  With  UCSD  only 
a  few  miles  from  S3,  we  can  obtain  substantial  output  from  an 
ILLIAC  run  only  an  hour  after  program  completion  on  the 
ILLIAC.  The  display  software  enables  an  ILLIAC  program  to 
output  selected  quantities  under  program  control.  This 
eliminates  the  tedious  task  of  searching  memory  dumps  for 
computed  results.  Since  September  1973,  the  ILLIAC  instal¬ 
lation  has  maintained  a  job  turn-around  of  roughly  once  or 
twice  a  day  for  two  weeks  out  of  every  month.  With  new  ser¬ 
vices  and  improved  job  turn-around,  the  ease  of  getting  work 
done  has  improved  enormously. 

S3's  first  ILLIAC  code,  a  linearized  version  of  SWIS, 
became  operational  in  July  1973,  and  has  been  exercised  on 
several  modest  problems.  A  more  general  nonlinear  version 
of  SWIS,  presently  about  1200  lines  of  GLYPNIR  coding  in 
size,  is  now  in  the  testing  stage.  As  a  result  of  improved 
job  turn-around,  this  new  SWIS  code  has  progressed  from  de¬ 
sign  stage  to  successful  test  runs  in  slightly  less  than 
three  months.  During  that  period,  ILLIAC  reliability  was 
sufficient  to  debug  the  new  code  directly  on  Lhe  machine 
rather  than  simulate  the  ILLIAC  with  SSK  as  was  often  neces¬ 
sary  before  September.  We  estimate  that  our  recent  code 
design  and  debug  rate  on  the  ILLIAC  has  been  nearly  50  per¬ 
cent  of  the  rate  at  which  we  could  have  developed  comparable 
code  on  S3's  1108  computer.  We  estimate  that  roughly  half 
of  the  ILLIAC  development  is  spent  communicating  with  the 
ILLIAC  host  site  via  an  interactive  terminal. 

In  conclusion,  we  have  had  satisfactory  results  from 
our  first  attempts  to  use  the  ILLIAC  IV.  Part  of  this  suc¬ 
cess  stems  from  careful  selection  of  the  algorithms  we  first 
tried  to  implement.  Code  development  progressed  relatively 
smoothly,  though  was  inhibited  somewhat  by  the  effort  of 
interactive  communication  with  the  ILLIAC  facility.  Further 


attention  must  be  given  to  the  difficulties  of  handling 
large  quantities  of  output  before  major  calculation  can 
be  performed. 

3.2  LINEAR  STRESS  WAVE  CODE 
3.2.1  General  Description 

The  goal  in  our  ILLIAC  code  development  effort  has 
been  to  numerically  simulate  stress  waves  in  3-D  geologic 
materials.  The  first  step  in  attaining  this  goal  was  to 
develop  a  time  stepping  algorithm  for  propagating  small 
amplitude  waves  in  linear  materials.  The  linear  algorithm 
is  formulated  in  Section  2.8.3;  the  algorithm  is  expressed 
by  Eq.  (2.61).  In  limiting  our  attention  to  linear  material, 
we  were  able  to  construct  a  compact,  yet  versatile,  code 
that  eased  our  first  efforts  to  use  the  ILLIAC  IV. 

The  algorithm  accommodates  nonsystematic  node  number¬ 
ing  of  1-D,  2-D,  or  3-D  numerical  grids.  As  there  is  no 
relationship  between  grid  numbering  and  the  number  of  PE's 
(processing  elements)  in  the  array,  very  irregular  grids 
consisting  of  beams,  plates,  and  so  forth,  may  be  analyzed. 
Furthermore,  the  algorithm  is  as  efficient  for  irregular 
grids  as  it  is  for  systematic  grids.  This  is  accomplished 
by  a  work-ahead  procedure  in  which  PE's  simultaneously  com¬ 
pute  contributions  to  the  advanced  displacements  of  several 
different  nodes.  A  serial  machine  requires  261  floating 
point  multiply  and  add  operations  to  obtain  a  nodal  displace¬ 
ment  at  the  advanced  time  step  in  a  3-D  grid  (Eq.  (2.61)). 

The  ILLIAC  algorithm  performs  an  average  of  4  parallel 
floating  point  operations  plus  3  row  sums  for  each  advanced 
nodal  displacement.  Thus,  only  a  small  overhead  has  been 
added  to  accommodate  the  parallel  operations. 
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Based  on  predicted  computing  rates  for  the  ILLIAC, 
it  appears  that  a  carefully  coded  version  of  the  linear 
time  stepping  algorithm  should  run  I/O  bound  on  the  array. 
This  would  require  a  processing  rate  of  roughly  0.4  seconds 
per  tine  step  for  a  10,000  node  3-D  grid.  Accurate  timings 
are  not  available  for  the  present  operational  version  of 
the  linear  code  (programmed  in  GLYPNIR) .  However,  it  appears 
that  our  present  computing  rates  are  considerably  slower 
than  the  theoretical  rate  noted  above. 

3.2.2  Numerical  Problem  Definition 

The  definition  of  a  complex  grid  can  require  a  large 
amount  of  data.  In  the  most  general  case,  the  definition 
must  include  data  for  individually  locating  each  node  and 
some  information  about  the  interconnectivity  of  the  elements 
in  the  giid.  In  addition,  inhomogeneous  material  properties 
must  be  specified  element  by  element  throughout  the  grid. 

In  all,  about  17N  data  items  are  required  to  completely 
describe  a  totally  arbitrary  N-node  3-D  grid.  Such  a  re¬ 
quirement  would  make  3-D  grids  excessively  tedious  to  set 
up . 

In  order  to  make  the  linear  version  of  SWIS  as  flex¬ 
ible  as  possible  and  not  tie  it  down  to  any  particular  grid 
generation  scheme  or  finite  element  scheme,  the  influence 
coefficients  for  the  grid  are  generated  separately  and  be¬ 
come  data  for  the  time  stepping  algorithm.  One  limitation 
of  this  approach  is  that  the  algorithm  can  compute  stress 
waves  only  in  materials  with  linear  stress  -  strain  laws, 
since  no  provision  is  made  for  recomputing  the  influence 
coefficients  during  the  time  stepping.  However,  since  the 
grid  need  only  be  generated  once,  one  may  employ  as  sophis¬ 
ticated  a  grid  generation  scheme  as  desired  involving  curved 
grids  or  structural  appendages,  with  no  effect  on  the 
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efficiency  and  speed  of  the  time  stepping  algorithm. 

In  the  present  code  configuration,  spatial  discre¬ 
tization  is  carried  out  on  a  serial  computer  using  a  con¬ 
ventional  finite^element  code.  This  step  produces  the 
terms  (F.(t)}f  [M]  ,  and  [K^]  of  Eq.  (2.61).  These  terms 
are  then  sorted  by  the  serial  machine  program  into  the 
order  required  by  the  ILLIAC  time  stepping  scheme.  A  sort 
algorithm  has  been  designed  to  perform  this  sort  on  the 
ILLIAC  (Frazier,  et  al . ,  1973),  but  has  not  been  implemented 
as  our  recent  work  has  turned  to  implementation  of  the  non¬ 
linear  version  of  SKIS  (described  in  Section  3.3) 

At  each  time  step,  Eq.  (2.61)  is  processed.  It  has 
the  form 


(U (t  +  At)  }  =  { Y ( t )  }  +  [A]  (ud(t)  } 


(3.1) 


where 


(U  (t)}  =  (1+3) ( U ( t ) }  -  B(U(t-At) } 

(V(t)}  =  At  2 [M] ■ 1 (F(t)}  +  2{U(t) }  -  (U(t-At) } 


(3.2) 


(3.3) 


[A]  =  -At2[M] [K] 

Underscores  are  used  here  in  place  of  directional  component 
subscripts  i  and  j.  Using  nodal  subscripts,  Eq.  (3.1)  be¬ 
comes 


(3.4) 


U  (t  +  At)  =  V  ft)  +  V  A  ud(t)  . 

~n^  /  j  ~nm  ~m  ' 


(3.5) 


The  terms  Un ,  Vr ,  U^,  and  Fn  correspond  to  node  n  in 
the  numerical  grid  and  each  represents  3  floating  point  num¬ 
bers  on  the  computer  for  a  3-D  grid.  Space  is  provided  for 
each  vector  by  storing  sequentially  across  PE's.  For  example, 
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the  64th  term  of  (U(t)},  corresponding  to  the  displacement  of 
the  64th  node  point  in  the  grid,  falls  into  PE  63  (Fig.  3.1). 
The  vectors  wrap  around  in  core  so  that  U65  appears  in  PE  0. 
In  general,  vector  term  n  appears  in  PEk  where  k  is  the 
remainder  of  dividing  n-1  by  64.  In  the  present  version  of 
linear  SWIS,  these  vectors  are  core  contained.  A  three- 
dimensional  grid  of  N  nodes  would  require  3N  storage  loca¬ 
tions  for  each  vector.  With  roughly  131K  words  of  PE  memory 
available,  the  code  is  limited  to  3-D  problems  containing  no 
nore  than  N  =  10,000  nodes. 

The  matrix  of  influence  coefficients,  [A],  is  an 
N  x  n  matrix  of  3  x  3  submatrices,  or  3N  *  3N.  For  a  problem 
of  N  =  104  nodes,  the  matrix  would  consume  roughly  109  words 
of  storage.  However,  [A]  is  sparse  with  each  row  generally 
containing  no  more  than  27  non-zero  3  x  3  submatrices.  This 
is  a  consequence  of  the  connectivity  in  a  3-D  grid  of  skewed 
brick  elements  in  which  each  interior  node  point  is  connected 
directly  to  26  neighbor  nodes  and  each  boundary  node  to  less 
than  26  neighbors.  If  the  matrix  is  compressed  to  remove 
zero  submatrices,  the  storage  is  reduced  to  roughly:  (number 
of  nodes  in  the  grid)  x  (number  of  neighbors  plus  one)  x 
(words  of  storage  for  a  submatrix)  =  104  x  27  x  10  =  2.7  x  106 
words  for  N  =  10'.  The  extra  word  of  storage  for  the  3  x  3 
submatrix  contains  the  row  and  column  position  of  the  sub¬ 
matrix  in  the  uncompressed  matrix. 

These  submatrices  are  stored  on  the  high  speed  disk  in 
an  order  appropriate  for  the  sparse  matrix  multiply  which  is 
performed  repetitively  during  the  time  stepping.  As  illus¬ 
trated  in  Fig.  3.2,  the  column  number  of  the  nonzero  terms 
in  [A]  provides  the  PE  destination,  i.e.,  A^m  is  to  appear 
in  PEk  where  k  is  the  remainder  of  the  division  of  m-1  by 
64.  This  scheme  assures  that,  as  the  influence  coefficients 
are  read  into  PE  memory  from  the  14  disk,  each  coefficient 
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will  arrive  in  the  PE  that  contains  the  corresponding  nodal 
displacement.  For  a  3-D  problem,  9  parallel  multiplications 
can  then  be  performed  for  each  10  rows  of  influence  coeffi¬ 
cients  that  arrive  from  the  disk.  The  only  PE  interactions 
that  are  needed,  even  for  irregular  grid  configurations,  result 
from  summing  accumulated  products  between  PE's  -  one  row  sum 
per  row  of  the  sparse  matrix  [A].  More  detail  on  our  sparse 
matrix  multiply  scheme  is  presented  in  the  following  section 
and  in  Appendix  1 1 . 


3.2.3  Time  Stepping 


The  time  stepping  process  consists  of  the  calculation 
of  Eq.  (3.5)  for  each  time  step.  The  first  term  (V(t)}  of 
Eq.  (3.5)  involves  column  vector  operations  which  require  no 
interaction  amongst  ILLIAC  PE's.  As  a  result,  it  is  easily 
computed  in  a  parallel  process.  Similar  column  vector  onera- 
tions  are  involved  in  the  calculation  of  (Ud(t)}.  The  sig¬ 
nificant  calculation  is  the  multiplication  of  the  vector 
(U  (t) }  by  the  large  sparse  matrix  [A].  This  multiplica¬ 
tion  accounts  for  almost  all  of  the  computation  time  that 
is  required  to  complete  one  numerical  time  step.  A  sophis¬ 
ticated  but  simple  mechanism  has  been  developed  to  perform 
the  sparse  matrix  multipl)  in  parallel  (Frazier,  et  al. ,  1973) 
The  non-zero  terms  of  [A]  in  Eq.  (3.5)  are  arranged  on  disk 
so  that  each  3x3  submatrix  Anm  arrives  in  the  PE  contain- 
ing  Urn*  Furthermore ,  as  successive  terms  of  [A]  are  read 
from  disk  the  matrix  row  numbers  n  increase  monotonically 
(but  not  necessarily  sequentially)  in  each  PE.  This  is  done 
so  that  the  sparse  matrix  multiply  can  be  completed  in  the 
order  of  ascending  row  number. 

The  first  submatrix  A^  to  arrive  in  each  PE  from 
the  disk  (the  Anm  with  the  lowest  row  number  n  that  ap¬ 
pears  in  each  PE)  is  multiplied  by  the  three -component  vector 
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Um  and  the  results  are  accumulated  in  a  buffer  R  along 
with  the  row  number  identifier  n.  This  operation  allows 
some  PE's  to  work  on  the  same  row  number  n  while  other 
PE's  work  ahead  on  other  row  numbers.  Since  several  rows 
may  be  processed  simultaneously,  a  look-ahead  buffer  {R} 
is  maintained  in  each  PE  which  contains  both  the  elements 
R  and  the  row  number  n.  Since  rows  will  continuously  be 
completed  as  new  ones  are  started,  (R)  need  only  be  large 
enough  to  contain  the  maximum  number  of  R's  to  be  worked 
on  at  one  time  in  any  given  PE.  On  the  average,  all  of  the 
multiplies  for  64/27  «  2.4  rows  of  the  sparse  matrix  multiply 
are  completed  after  such  an  operation.  Rows  that  correspond 
to  boundary  nodes  require  less  calculation. 

During  the  matrix  multiply,  a  test  is  made  to  see  if 
all  contributions  from  the  sparse  matrix  multiply  are  ready 
to  be  summed  for  the  node  n#.  If  all  of  the  row  numbers  n 
from  the  submatrix  multiply  are  greater  than  n  ,  then  all 
contributions  for  nfl  are  completed  (all  PE's  are  now  work¬ 
ing  on  contributions  to  higher  node  numbers).  The  contribu¬ 
tions  for  nQ  are  then  summed  and  added  to  the  other  terms 
in  Eq.  (3.5)  to  obtain  the  advanced  nodal  displacements 
yno(t+At^  This  displacement  vector  is  stored  in  PEk,  k 
being  the  remainder  of  n^l  divided  by  64.  If  the  r  ontribu- 
tions  from  row  no+l  are  completed,  then  node  n  +1  is  also 
advanced  in  time  by  summing  contributions  from  “part icipating 
PE's,  otherwise  the  next  submatrix  multiply  in  line  for  each 
PE  is  performed.  The  parallel  submatrix  multiplies,  row  sums, 
and  disk  reads  continue  until  all  of  the  [A]  matrix  has 
been  processed  and  all  nodes  have  been  advanced  in  time.  The 
entire  operation  is  repeated  for  each  time  step.  (A  more 

detailed  description  of  the  sparse  matrix  multiply  appears  in 
Appendix  B.) 
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3.2.4  Tests 


The  first  version  of  linear  SWIS  was  coded  in  GLYPNIR 
and  also  in  ASK.  Several  tests  were  made  with  the  GLYPNIR 
code  on  the  ILLIAC  IV  simulator  at  UCSD  (Frazier,  et_  al., 
1973).  The  GLYPNIR  version  of  linear  SWIS  with  a  simple 
grid  generator  became  operational  on  the  ILLIAC  IV  in  April 
1973.  Because  of  difficulties  with  the  disk  hardware  at 
that  time,  SWIS  was  run  with  the  [A]  matrix  held  entirely 
m  core.  Several  test  runs  involving  the  propagation  of 
planar  P  waves  in  3-D  media  have  been  made  to  check  bound¬ 
ary  conditions  in  the  code.  Successful  runs  of  over  100 
seconds  on  the  ILLIAC  were  completed. 
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3,3  LAGRANGIAN  stress  wave  code 


3,3,1  Grid  Configuration 


devisedAffeXible-SChe'"e’  deSCribed  in  Section  II,  has  been 
"  f0r  nun,ericall>-  Simuuting  stress  waves  in  geologic 

Irid'conr  The.fl°Xlbilit>’  for  handling  highly  irregular 
i  ,  ‘«uratl0ns  has  been  compromised  somewhat  in  adapt- 
mg  the  scheme  to  the  ILLIAC. 


The  genera]  numerical  scheme  admits  mixed  element 
>pes  (e.g  trahedra  and  hexahedra)  with  nonsystematic 
Ode  and  element  numbering.  Because  of  the  difficulties  in 

:r;r,w"tiw  bet,<een  pe,s  *  »  «««"*  » 

un  tne  ILLIAC,  we  associate  grid  rrncc  •  , 

^ria  cross-sections  with  PP '  =  qc; 
lllustrated  in  Pin  t,  t.  aj;  ’  as 

_.sn  •  ,  .  8;  3‘3'  Adjacent  grid  cross-sections  are 

associated  with  adjacent  PF's  sn  i-Knn- 

cen,  i„  M  .  ,  L  5  50  tbat  Points  which  are  adia- 

the  grid  appear  in  the  same  or  adjacent  PE's. 

The  first  grid  dimension,  which  is  normal  to  grid 
section  mentioned  above,  is  strung  across  PE's.  Thi  enables 
a  string  of  64  elements,  lying  in  64  contiguous  cross-sevens 
o  be  processed  in  parallel.  We  note  that  the  first  grid 

coomrdninated0eTon0t  — spend  to  the  first  problem 

PTdinate.  To  unsure  totally  parallel  operations  for  the 

of  the  calculations  on  the  ILLIAC,  the  requirement  is 

made  that  each  element  string  in  the  first  grid  dimension  con- 

stri„  ,\Ta'T  ^  Same  b3SiC  tyPe'  Tbat  iS'  °ne 

J  “  :  :\COnt:;n  b0th  and  hexahedral  (skewed- 

.  Cn  e  Present  operating  version  of  SW1S  treats 

nly  one  element  type  over  the  entire  grid:  8-„0de  hexalV 

3D.  4-node  quadralaterals  in  2-D,  and  2-„„de  line  segments 

low  r  restriction  may  be  lifted  in  the  future  to  al¬ 

ow  for  varying  element  types  between  element  strings. 
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The  requirement  for  uniformity  of  element  types 
imposes  restrictions  only  on  the  connectivity  between 
elements,  not  on  the  size  or  shape  of  the  elements.  No 
restrictions  are  placed  on  material  types,  applied  loads, 
or  boundary  conditions  in  any  dimension  of  the  grid.  The 
ILLIAC  SKIS  code  can  also  treat  irregular  geometric  shapes. 
The  number  of  elements  can  vary  between  node  strings  due 
to  irregular  grid  boundaries,  either  internal  or  external. 
However,  inefficiencies  in  PE  utilization  occur  when  the 
number  of  elements  in  a  node  string  is  not  a  multiple  of  64. 

3.3.2  Storage  Scheme 

The  ILLIAC  SWIS  code  uses  three  types  of  data  storage: 

1.  General  problem  descriptors  require  only  minor  storage 
and  include  such  terms  as  curvilinear  coordinate  designator, 
number  of  spatial  dimensions,  material  descriptions,  grid 
descriptions,  time  stepping  data,  and  data  for  printing 
selected  results. 

2.  Global  storage  contains  nodal  and  element  information 
for  the  entire  grid.  For  each  node  point  in  the  grid,  the 
present  version  of  SWIS  stores  coordinates  (position),  dis¬ 
placement,  velocity,  acceleration,  boundary  condition  type, 
applied  force  (or  displacement),  and  concentrated  mass. 
Material  type  and  element  stress  are  stored  for  each  element 
in  the  grid.  Nodal  and  element  storage  are  combined,  stor¬ 
ing  one  node  with  one  element,  to  yield  9,  16,  and  24  words 
of  storage  per  1-D,  2-D,  and  3-D  element,  respectively. 

Figure  3.4  illustrates  the  various  grid  terms  that  appear 

in  global  storage. 

The  global  storage  scheme  has  been  designed  first  to 
minimize  interactions  between  PE's  and  second  to  simplify 
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3.4- -Mult ipl  exed  storage  of  global  variables 


disk  accessing.  As  indicated  in  the  previous  section, 
global  storage  for  one  dimension  of  the  grid  (the  first 
dimension)  is  strung  across  PE's  with  variables  for  cross- 
section  n  appearing  in  PEk  where  k  is  the  remainder 
from  the  division  of  n-1  by  64.  Figure  3.3  illustrates  the 
relationship  between  the  PE's  and  the  grid  configuration. 

All  of  the  global  variables  needed  to  process  one  element, 
i.e.,  to  compute  element  restoring  forces,  are  always  con¬ 
tained  in  two  adjacent  PE's  (PEO  is  considered  to  be  adja¬ 
cent  to  PE63) .  Node  variables  are  routed  left  at  the  be¬ 
ginning  of  the  loop  for  processing  a  string  of  04  elements, 
and  the  computed  element  restoring  forces  are  routed  right 
at  the  end  of  the  loop.  Thus,  by  associating  global  storage 
with  grid  configuration,  we  have  reduced  PE  interactions  to 
simple,  predictable  routing  operations. 

Primarily  for  the  purpose  of  expediting  efficient 
use  of  the  14  disk,  the  various  node  and  element  quantities 
are  multiplexed  in  global  storage,  as  illustrated  in  Fig.  3.4. 
In  the  present  design,  global  arrays  will  appear  sequentially 
on  the  14  disk  with  grid  dimension  one  most  rapidly  varying 
in  the  grid  numbering  scheme.  Thus,  if  a  particular  global 
variable  is  located  at  position  n  on  the  disk  then  the 
corresponding  variable  for  the  next  higher  element  number 
would  be  located  at  position  n  +  (D2  +  1 ID) / 2  +  3  where  D 
is  the  number  of  spatial  dimensions  in  the  grid  and 
CD'1  +  1 1 D) / 2  +  3  is  the  number  of  terms  per  element  multi¬ 
plexed  into  global  storage. 


3.  Element  string  storage,  which  occupies  about  60  words 

of  fixed  storage  per  PE,  contains  variables  for  processing 
one  string  of  64  elements.  The  element  string  storage  serves 
as  buffer  storage  for  certain  global  variables  (node  posi¬ 
tions  and  node  velocities)  and  as  storage  for  intermediate 
variables  not  stored  globally  (curvilinear  coordinate  metrics, 
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element  interpolation  functions  and  their  spatial  deriva¬ 
tives,  element  strain  rates,  element  stress  rates,  and 
element  restoring  forces).  By  setting  aside  special  element 
storage,  we  have  managed  to  isolate  inner  PE  exchanges  from 
the  major  calculation  phase  of  the  code. 

3.3.3  Initialization 

As  described  in  Section  2.2,  a  stress  wave  calcula¬ 
tion  is  initiated  by  data  that  specifies  the  character  of 
the  grid,  the  (time -vary ing)  boundary  conditions,  the 
material  properties,  the  time  stepping  data,  the  initial 
conditions,  and  the  type  of  results  to  be  printed.  In  the 
present  configuration,  about  50  words  of  data  are  required 
to  initiate  a  3-D  calculation.  Approximately  half  of  these 
data  serve  to  define  the  grid.  Nonzero  initial  conditions 
and  time  varying  forcing  terms  require  additional  data. 

Before  proceding  with  the  time  stepping  calculations, 
global  storage  is  initialized.  Based  on  the  number  of  dimen¬ 
sions  in  the  grid  (and  on  the  number  stress  components  to 
be  included  in  non- Cartes ian  calculations),  the  global 
multiplexed  storage  is  dynamically  assigned  within  a  large 
segment  of  available  core  at  run  time.  Displacements, 
velocities,  and  element  stresses  are  set  to  a  default  value 
of  zero.  Boundary  conditions  and  material  types  are  set 
from  the  problem  input  data.  A  code-contained  grid  generator 
serves  to  produce  node  positions  in  both  Cartesian  and  curvi¬ 
linear  grids.  Nodal  masses  and  nodal  forces  are  computed  by 
numerical  integration,  Eq .  (2.20).  The  nodal  accelerations 
are  initialized  to 

{U}  =  [M]'1{Fi)  , 

Eq.  (2.29)  with  {R^}  =  {Q^}  =  0.  We  note  that,  with  the 
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exception  of  boundary  conditions  and  nodal  forces,  initiali¬ 
zation  calculations  are  conveniently  processed  in  parallel. 

3.3.4  Time  Stepping 

The  calculations  that  are  performed  to  numerically 
propagate  stress  waves  are  described  in  Section  2.6,  Eqs. 
(2.22)  —  (2.34).  Figure  3.5  illustrates  how  these  calcula¬ 
tions  are  performed  on  the  ILLIAC  using  the  following  basic 
operations:  Grid  positions  and  nodal  velocities  combine  to 

yield  element  strain  rates.  The  element  strain  rates  com¬ 
bine  with  material  properties  to  yield  element  stress  rates, 
which  are  used  to  update  total  element  stresses.  Element 
restoring  forces,  which  are  computed  from  the  element 
stresses,  are  combined  with  externally  applied  forces  (or 
other  boundary  conditions)  to  produce  nodal  accelerations. 
Nodal  displacements  and  nodal  velocities  are  then  advanced 
one  time  step  by  direct  numerical  integration  if  the  nodal 
accelerations.  This  time  stepping  procedure  continues  until 
the  wave  simulation  is  completed. 

We  note  that  the  algorithm  has  been  organized  so  that 
interactions  between  PE's  occur  at  only  two  points  in  the 
computing  sequence.  The  initial  operation  in  processing  a 
string  of  64  elements  involves  a  parallel  route  left.  This 
serves  to  bring  global  values  of  nodal  velocity  and  nodal 
coordinates  into  local  element  storage.  The  nodal  restoring 
forces  for  the  64-element  string  are  then  computed  in  paral¬ 
lel  with  no  inner  PE  communications.  These  operations  repre¬ 
sent  the  bulk  of  calculations  for  completing  one  time  step. 

The  final  operation  in  processing  a  string  of  64 
elements  involves  a  route  right.  The  nodal  restoring  forces 
are  divided  by  the  corresponding  nodal  masses  and  accumulated 
in  global  storage  as  contributions  to  nodal  accelerations. 
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Loop  on  time 

Loop  on  element  strings. 

~ (Loop  on  element  integration  points.) 
ShlST1'  coeffi““‘s.  Section  2.3.1, 

Compute  strain  rate,  Eq.  (2.22). 

stress^  Eq"??  Jff6’  Eg.C2.24), 
stress ,  Eq.'  (2!  26).'  Mping 

f^oi!mUlate  element  restoring  forces 

^from  numerical  integration,^, (2.27). 

Accumulate  nodal  acceleratinnc  • 
global  storage  Ea  r?  ?qw  ln 
a  parallel  route dP  '  }t  „ln?Ves 
^restoring  f^FEFsTT^—  h  element 

Satisfy  b°undary  conditions,  Eqs  (2  3(n 
0(ff3n)  C±0r  ^constrained  nodes! 'pe's  3 

Advance  nodal  veloritioc  c 

nodal  displacements,  Eq-’dW'  III’ 

nodal  coordinates,  Eq.  (2.34).  ’  ”d 


Flg.  3-5--sP^lJel  computing  scheme  used  in  the 


Strings  of  64  elements  are  sequentially  processed  with  a 
parallel  route  left  (global  to  element  storage)  in  the 
initial  operation  and  a  parallel  route  right  (element  to 
global  storage)  in  the  final  operation. 

The  acceleration  of  each  node  in  the  grid  is  deter¬ 
mined  from  the  processing  of  all  element  strings.  Boundary 
conditions  are  then  applied  by  sweeping  through  global  stor¬ 
age,  modifying  accelerations  of  only  those  nodes  with  force, 
displacement,  or  fixed  boundary  conditions.  At  the  same 
time,  the  advanced  nodal  velocities  and  displacements  are 
computed  in  parallel  by  time  integration.  This  operation 
completes  one  numerical  time  step. 

3.3.5  Code  Output 

For  the  testing  stage  of  SWIS,  we  have  relied  on  two 
basic  types  of  numerical  output.  Both  of  these  employ  the 
DISPLAY  software  facility  for  outputting  formatted  data 
during  run  time. 

One  method  is  to  output  an  entire  PE  row  of  informa¬ 
tion  at  a  selected  time  step.  Since  an  interesting  portion 
of  the  problem  often  has  its  orientation  across  PE's,  row 
output  can  provide  the  desired  results  in  condensed  form. 

One  test  problem  —  Lamb's  problem  as  described  in  the  next 
chapter  -  had  the  free  surface  of  the  grid  oriented  along 
the  first  grid  dimension.  Thus  six  row  outputs  provided 
horizontal  and  vertical  components  of  displacement  velocity 
and  acceleration  along  the  entire  free  surface.  This  ap¬ 
proach  is  less  convenient  for  grid  cross-sections  other  than 
those  oriented  across  PE  boundaries.  A  mechanism  is  used 
for  transferring  a  series  of  selected  quantities  into  a 
buffer  row  of  memory  for  output. 
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A  similar  mechanism  is  used  for  collecting  time 
histories.  The  displacement,  or  some  other  quantity  at 
selected  points  in  the  grid,  is  accumulated  into  a  buffer 
area  following  each  time  step.  When  the  buffer  is  full, 
it  is  output  and  the  accumulation  is  resumed.  In  this  way, 
the  behavior  of  a  point  in  the  grid  over  time  may  be  con¬ 
veniently  displayed. 

We  anticipate  incorporating  further  modes  of  program 
output  into  SWIS.  One  is  a  binary  output  mode  in  which  large 
quantities  of  unformatted  data  are  transferred  to  UNICON 
laser  storage  during  run  time.  Selected  portions  of  this 
file  could  be  transferred  over  the  ARPANET  for  printing  or 
plotting  at  another  site.  A  further  possibility  is  to  in¬ 
corporate  some  code  into  SWIS  for  generating  plots  using 
the  network  graphics  protocol  (NIC  #153S8,  1973).  This  plot 
information  could  be  plotted  at  any  site  equipped  with  pre¬ 
processors  for  the  graphics  protocol. 
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IV.  STRESS  WAVE  CALCULATIONS 


Our  primary  thrust  during  this  contract  period  has 
been  to  develop  code  on  the  ILLIAC  for  performing  3-D  stress 
wave  calculations.  Test  calculations  have  been  performed 
in  1-D,  2-D,  and  3-D  geometries  to  verify  the  resulting  code 
and  to  examine  particular  features  of  the  computing  algorithm 
A  sample  of  these  test  calculations  are  presented  below. 

4 . 1  LAMB'S  PROBLEM:  2-D  CARTESIAN  COORDINATES 

The  ILLIAC  SWIS  code  has  been  exercised  in  2-D 
Cartesian  geometry  (plane- strain)  for  treating  Lamb's 
problem:  A  line  load  (a  point  load  in  the  plane  of  the 

grid)  applied  as  an  impulse  to  the  free  surface  of  a  half¬ 
space.  Figure  4.1  illustrates  the  numerical  presentation 
and  provides  the  physical  parameters  that  were  used  in  the 
calculation.  Based  on  central  system  clock  times,  we  esti¬ 
mate  that  the  calculations  were  processed  at  the  rate  of 
0.4  m-sec  per  2-D  element  per  numerical  time  step. 

The  sharp  wave  forms  that  arise  from  the  concentrated 
impulse  loading  serve  as  a  critical  test  for  the  stress  wave 
computing  scheme.  Frazier,  et_  al .  (1973)  have  reported  on 

finite  element  and  finite  difference  treatments  of  Lamb's 
problem  using  S3's  UNIVAC  1108.  In  the  present  treatment 
of  Lamb's  problem  on  the  ILLIAC,  we  have  examined  alternats 
schemes  for  dealing  with  bending  modes  in  the  individual 
elements  (hour  glass  deformations,  see  Appendix  A)  and  al¬ 
ternate  schemes  for  damping  spurious  high  frequency  noise. 

We  have  also  made  an  effort  to  test  the  effectiveness  of 
transmitting  boundary  conditions  in  2-D  geometry. 

Computed  displacements  along  the  free  surface  are  com 
pared  with  mathematical  solutions  in  Figs.  4.2,  4.4  -  4.6. 

In  the  first  of  this  series  of  calculations,  Figs.  4.2  and 
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I  ine  load  =  0.985  *  101U  dynes  per  cm  along  x, 
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At  =  1  usee  =  0.555 


Physical  Parameters 

P  «=  mass  density  =  2.77  g/cm3 
Vp  *=  P  wave  velocity  =  0.555  cm/sec 
V£  =  S  wave  velocity  =  0.3145  cm/usec 
Vr  *=  Rayleigh  wave  velocity  =  0.2898  cm/usec 
L  *  total  impulse  =  //  pressure  dx^  dt 
*  1-97  x  io10  dyne-usec/cm 

(i),.  4 . 1  -  -  Prob  lem  description  used  for  analyzing  a  line  load 
impulse  on  a  half-space  (Lamb's  problem). 


Vertical  Displac 


Fig.  4 . 2  -  -  V er t ical  displacement  of  the  free  surface:  Damping 
coefficient  8p  =  0.80  for  P  waves,  =  0.80  for 
S  waves,  and  8,.  -  0.80  for  hour  glass  mot  ions . 

nner  element  stress  variations  due  to  bendinc  are 
not  included. 
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Fig.  4.3- -Time  history  of  a  point  on  the  free  surface,  39  cm  from 
the  applied  load,  Damping  coefficient  6p  =  0.80  for 
P  waves,  Bq  =  0.80  for  S  waves,  and  =  0.80  for 
hour  glass  motions.  Inner  element  stressHvariations 
due  to  bending  are  not  included. 
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6. 4 - -Vertical  displacement  of  the  free  surface:  Damping 
coefficient  Bp  =  0.40  for  P  waves,  Bg  =  0.80  for 
S  waves,  andp,,  =  0.40  for  hour  glass  mot  ions . 
Inner  element  stress  variations  due  to  bending  are 
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. 6- -Horizontal  displacement  of  the  free  surface:  Damp  inn 
coefficient  f p  -  0.40  for  P  waves,  =  0.80  for 
S  waves,  and  0j,  =  0.40  for  hour  glass  motion .  Fifty 
percent  of  innei'  element  stress  variations  due  to 
bending  arc  included. 


4.3,  a  single-point  integration  scheme  was  employed  for 
computing  restoring  forces  of  the  medium  (see  Eq .  (2.27) 
or  alternately  Eq.  (2.46)).  As  discussed  in  Appendix  A, 
this  scheme  does  not  take  into  account  the  spatial  varia¬ 
tions  in  stress  within  an  element.  Consequently,  the 
elements  contain  no  stiffness  against  binding.  The  result 
is  that  the  concentrated  load  configuration  excites  hour 
glass  deformation  modes  in  the  grid  in  addition  to  the  de¬ 
sired  wave  forms.  An  effort  was  made  to  control  this  un¬ 
desirable  response  by  introducing  a  special  artificial 
viscosity  with  the  sole  purpose  of  damping  the  hour  glass 
deformations.  While  this  procedure  was  successful  in  sup¬ 
pressing  the  hour  glass  modes  from  the  computed  velocities, 
we  see  from  Fig.  4.2  that  considerable  numerical  noise 
appears  in  the  displacement  field  in  the  vicinity  of  the 
applied  load.  By  employing  the  hour  glass  damping  scheme, 
a  well- formed  Rayleigh  wave  emerges  at  points  greater  than 
about  20  grid  dimensions  from  the  source.  The  time  history 
of  a  point  39  cm  from  the  source,  presented  in  Fig.  4.3, 
shows  no  trace  of  hour  glass  noise.  We  note  that  no  special 
treatment  is  required  to  control  hour  glass  deformation 
modes  when  the  surface  load  is  distributed  over  many  ele¬ 
ments;  there  is  essentially  no  excitation  of  hour  glass 
modes  in  this  case. 

The  SWIS  code  was  then  altered  to  take  account  of 
inner  element  variations  in  stress,  rather  than  to  arti¬ 
ficially  damp  the  hour  glass  modes.  The  vertical  component 
of  displacement  along  the  free  surface,  using  this  numeri¬ 
cally  consistent  formulation,  is  presented  in  Fig.  4.4. 

Also,  the  standard  numerical  damping  scheme,  Eq.  (2.26), 
was  altered  somewhat  for  this  calculation.  Shear  distor¬ 
tions  were  damped  using  a  coefficient  Bg  =  0.80,  the  same 
6  that  was  used  in  the  previous  calculation;  whereas  a 
smaller  damping  coefficient  of  0.40  was  applied  to 
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volumetric  distortions.  This  procedure  has  the  effect  of 
damping  high  frequency  P  and  S  waves  of  comparable  wave 
lengths  rather  than  damping  comparable  frequencies  as  is  the 
case  using  one  6  for  both  volumetric  and  distortional 
deformations  (see  Frazier,  et^  al_.  ,  Appendix  B,  1973). 

First,  we  see  from  the  results  presented  in  Fig.  4.4 
that  there  is  no  trace  of  hour  glass  deformation  present 
in  the  computed  displacement  field.  Also,  we  find  that  the 
reduction  in  the  artificial  damping  of  volumetric  distor¬ 
tions  results  in  a  more  distinct  P  wave  along  the  surface. 

As  a  final  numerical  experiment,  the  restoring  forces 
that  arise  from  bending  of  individual  elements  were  arti¬ 
ficially  reduced  by  50  percent.  This  inconsistent  analysis 
was  performed  to  examine  what  effect  the  bending  stiffness 
of  individual  elements  has  on  the  resulting  wave  forms. 
Cell-centered-stress  finite  difference  schemes  disregard 
the  bending  stiffness  of  individual  elements;  while  the 
consistent  finite  element  scheme,  used  to  produce  Fig.  4.4, 
slightly  over-estimates  the  bending  stiffness  of  elements. 

We  see  from  Fig.  4.5  that  a  slightly  sharper  Rayleigh  wave 
(higher  peak  displacement)  is  obtained  when  the  bending 
stiffness  of  the  elements  is  reduced.  It  appears  that  an 
accurate  representation  for  the  bending  stiffness  of  indi¬ 
vidual  elements  is  not  critical  to  this  particular  wave 
simulation.  Except  for  the  hour  glass  deformations  that 
permeate  the  displacement  field  near  the  point  of  loading 
(Fig.  4.2),  it  appears  that  artificial  damping,  which  is 
needed  to  control  numerical  dispersion,  has  as  much  effect 
on  peaked  wave  forms  as  the  bending  stiffness  of  the 
elements . 

Finally,  some  data  on  machine  reliability  have  been 
obtained  from  the  numerical  simulations  of  Lamb's  problem. 
Eight  separate  calculations  were  initiated,  each  set  for 
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500  time  steps.  Major  errors,  which  are  detected  by  orders 
of  magnitude  increase  in  the  total  energy,  occurred  in  all 
eight  calculations  prior  to  completion.  Two  calculations 
did  not  reach  100  time  steps  before  the  total  energy  jumped; 
one  calculation  progressed  just  beyond  250  time  steps  be¬ 
fore  major  spurious  errors  occurred.  Each  numerical  time 
step  requires  about  3  *  10s  floating  point  multiply  and  add 
operations,  approximately  100  operations  per  element  per 
time  step.  Thus,  it  appears  that  bits  are  being  altered 
in  the  exponents  at  the  rate  of  about  one  error  uer  3  *  107 
floating  point  multiply  and  add  operations  for  the  parti¬ 
cular  sequence  of  machine  instructions  activated  by  this 
2-D  wave  simulation.  In  some  instances,  we  also  find  that 
low  order  errors  have  occurred  in  the  calculations  before 
errors  of  very  large  magnitudes  appear.  We  plan  to  in¬ 
corporate  rigorous  energy  checks  and  an  automatic  restart 
mechanism  in  the  SWIS  code  to  detect  and  recover  from 
machine  errors. 


4  •  2  PLANE  WAVES:  3-D  CARTESIAN  COORDINATES 

The  first  test  calculation  of  the  SWIS  code  in  3-D 
geometry  has  been  designed  to  simulate  a  plane  wave  pro¬ 
duced  by  an  impulse  pressure.  As  with  the  problem  of  the 
previous  section,  the  propagating  wave  has  a  discontinuity 
in  the  displacement  field  and  a  singularity  in  the  velocity 
field.  We  do  not  expect  to  match  behavior  of  this  type 
accurately  but  rather  to  examine  the  limitations  of  the 
numerical  procedure.  The  response  of  a  linear  system  to 
an  impulse  load  provides  the  Green's  function  for  the  sys¬ 
tem.  Thus,  by  treating  an  impulse  loading,  we  obtain  the 
Green's  function  for  the  numerically  discrete  system. 

The  grid  configuration,  grid  parameters,  and  material 
parameters  are  presented  in  Fig.  4.7.  The  grid,  which  is 
core-contained,  involves  1575  cubic  elements  with  63  grid 
spaces  in  the  direction  of  propagation.  The  uniform  pres¬ 
sure  pulse  produces  particle  motion  only  in  the  direction 
of  loading.  Computed  displacement  and  velocity  at  selected 
time  intervals  (t  =  40,  80,  120  ysec)  are  presented  in 
Fig.  4.8.  The  computed  velocity  field  represents  a  numeri¬ 
cal  approximation  to  a  delta  function.  The  time  integral 
of  the  computed  particle  velocity  matches  the  time  integral 
of  the  analytical  singularity.  This  is  illustrated  by  the 
close  agreement  between  computed  and  analytical  displace¬ 
ment  fields  behind  the  wave  front. 

Based  on  central  system  clock  times,  we  estimate  that 
the  3-D  plane  wave  calculations  were  processed  at  the  rate 
of  1.2  m-sec  per  element  per  numerical  time  step. 
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Uniform  Normal  Pressure  -  100  kbars 


Physical  Parameters 

P  ■  mass  density  *  2.5  g/cm^ 

Vp  *=  P  wave  velocity  =  0.5  cm/ysec 
vs  *=  S  wave  velocity  ■=  0.25  cm/ysec 
6  *  damping  coefficient  =  0.15 


4 . 7- -Problem  description  used  to  simulate  a  unifo 
impulse  pressure  applied  to  the  surface  of  a 
half-space . 


Boundary  Surface 


4.3  BURIED  EXPLOSION:  1-D,  2-D,  3-D  SPHERICAL  COORDI1ATES 


The  facility  in  the  SWIS  code  for  processing  wave 
calculations  in  curvilinear  coordinates  is  employed  for 
simulating  elastic  compression  waves  emanating  from  a 
spherical  cavity.  A  step  pressure  with  an  exponential 
decay  is  analyzed  using  the  following  parameters: 

rc  =  cavity  radius  =  10  m 

Ar  =  radial  zone  size  =  0.2  m 

At  =  time  step  =  20  psec 

B  =  damping  coefficient  -  0.15 

V  =  P  wave  velocity  =  5  km/sec 

p  7 

Vg  =  S  wave  velocity  =  2.5  km/sec 

P  =  mass  density  =  2.0  g/cm3 

- 1 0 3 1 

p(t)  =  cavity  pressure  =  e  kbar 

The  elastic  explosion  calculation  is  processed  in 
spherical  coordinates  using  1-D,  2-D,  and  3-D  grid  configu¬ 
rations,  illustrated  in  Fig.  4.9.  Each  grid  has  63  elements 
in  the  radial  dimension.  This  provides  equal  resolution 
for  each  grid  configuration  in  the  spherically  symmetric 
explosion.  Consequently,  we  would  expect  very  nearly  iden¬ 
tical  results  from  the  1-D,  2-D,  and  3-D  calculations. 

The  computed  response  of  the  cavity  wall  is  compared 
with  analytic  solutions  by  Blake  (1952)  in  Fig.  4.10. 

Radial  displacement  and  velocity  are  compared  2.0  m-sec 
after  pressurizing  the  cavity  with  the  analytic  solutions 
in  Fig.  4.11.  While  the  computed  points  in  Figs.  4.10  and 
4.11  are  taken  from  the  1-D  spherical  calculation,  the  2-D 
and  3-D  calculations  gave  results  with  1  percent  of  the 
1-D  calculation.  Therefore,  these  figures  can  be  con¬ 
sidered  representative  of  the  computed  behavior  for  all 
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Fig.  4.11--Radial  velocity  and  displacement  2.0  m-sec  after 
pressurizing  spherical  cavity;  2-D  and  3-D  re¬ 
sults  within  1  percent  of  plotted  points  for  1-D 
calculat ion . 
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three  grid  configurations. 

More  test  calculations  need  to  be  performed  to 
verify  the  SKIS  code  for  other  curvilinear  coordinate  sys¬ 
tems.  However,  because  no  code  alterations  were  needed 
to  produce  the  2-D  and  3-D  spherical  results,  it  appears 
likely  that  the  curvilinear  formulation  is  generally 
operational.  We  note  that  the  wave  calculations  processed 
in  spherical  coordinates  required  about  50  percent  more 
computer  time  than  would  have  been  required  for  a  comparable 
Cartesian  calculation. 
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V.  SUMMARY  AND  CONCLUSIONS 


Considerable  progress  has  been  made  in  the  develop¬ 
ment  of  an  ILLIAC  computer  code  for  simulating  stress 
waves  in  the  earth.  A  novel  finite  element  scheme  has 
been  developed  for  processing  stress  waves  in  1-D,  2-D, 
and  3-D  curvilinear  geometries.  The  numerical  scheme 
accommodates  finite  deformations  and  nonlinear  material 
behavior  using  a  Lagrangian  formulation. 

In  the  initial  phase  of  this  program,  code  develop¬ 
ment  was  confined  to  small  amplitude  stress  waves  with 
linear  material  response.  During  this  period,  an  ILLIAC 
code  was  developed  for  time  stepping  stress  waves  through 
highly  irregular  1-D,  2-D,  and  3-D  grid  configurations. 

The  linear  code  was  made  operational  on  the  ILLIAC,  and 
test  calculations  were  performed  as  early  as  March  1973 
to  verify  the  code.  The  linear  code  development  effort  on 
the  ILLIAC  proved  valuable  in  the  subsequent  development 
of  the  more  general  SWIS  code. 

Because  of  the  manner  in  which  the  more  recent  non¬ 
linear  SIVIS  code  has  been  adapted  to  the  parallel  structure 
of  the  ILLIAC,  grid  points  are  associated  with  PE's,  This 
has  requ i red  a  certain  regularity  in  the  grid  configurations, 
namely,  the  grid  must  be  composed  of  an  assemblage  of  ele¬ 
ment  strings.  The  parallel  processing  capabilities  of  the 
ILLIAC  are  most  effectively  utilized  when  the  number  of 
elements  in  the  element  strings  is  some  multiple  of  64. 

This  is  in  contrast  with  the  earlier  linear  code  which 
processes  irregular  grids  just  as  efficiently  as  regular 
grids. 

The  nonlinear  SWIS  code  was  programmed  using  GLYPNIR 
and  debugged  directly  on  the  ILLIAC.  Successful  test  cal¬ 
culations  were  performed  on  the  ILLIAC  just  three  months 
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after  the  initial  programming  was  begun.  This  fact  is  a 
credit  to  the  total  ILLIAC  system:  the  ARPANET,  the  ILLIAC 
IV  computer,  and  the  peripheral  equipment  at  the  ILLIAC 
site.  Systems  failures  have  interfered  with  our  ability 
to  utilize  the  ILLIAC  on  the  average  of  about  two  weeks 
out  of  each  month  for  the  past  three  months.  System  re¬ 
liability  is  expected  to  improve  in  the  coming  months. 

A  series  of  core - conta ined  test  calculations  have 
been  performed  on  the  ILLIAC  to  verify  the  SWIS  code  in 
1-D,  2-D,  and  3-D  Cartesian  and  spherical  coordinates. 
Cartesian  coordinates  were  employed  to  simulate  a  line  load 
impulse  (Lamb's  problem)  using  3150  2-D  elements  and  a 
pressure  pulse  using  1575  3-D  elements.  Spherical  coordi¬ 
nates  were  employed  to  simulate  a  pressurized  spherical 
cavity  using  63  1-D  elements,  3780  2-D  elements,  and  1575 
3-D  elements.  The  calculations  were  processed  at  the  rate 
of  0.4  and  1.2  m-sec  per  element  per  numerical  time  step 
for  the  2-D  and  3-D  Cartesian  grids,  respectively.  Appro¬ 
ximately  50  percent  slower  computing  rates  were  obtained 
for  the  spherical  calculations.  Based  on  repetitive  execu¬ 
tions  of  Lamb's  problem,  we  have  observed  what  appears  to 
be  machine  errors.  We  estimate  that  errors  occur  in  the 
exponent  bits  at  the  rate  of  one  error  per  3  *  107  floating 
point  multiply  and  add  operations.  Errors  in  the  mantissa, 
which  are  slightly  more  difficult  to  detect,  have  also  been 
observed.  We  plan  to  incorporate  rigorous  energy  checks 
and  an  automatic  restart  mechanism  in  the  SWIS  code  to  de¬ 
tect  and  recover  from  machine  errors. 
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APPENDIX  A 


INNER- ELEMENT  STRESS  VARIATIONS 

The  stress  tensor  evaluated  at  the  centroid  of  a  2-D 
or  3-D  lccti  linear  element  docs  not  adequately  describe  the 
state  of  stress  in  the  element.  We  note  that  no  stress  is 
generated  at  the  centroid  as  the  element  undergoes  a  bending 
del  01  mat  ion,  illustrated  in  fig.  A.l.  Stress  wave  calcula¬ 
tions  that  use  cent roi dal  stresses  exclusively  can  result  in 
hour-glass  deformation  patterns  superposed  on  the  computed 
d i spl a cement  s . 

bet  A  denote  the  amplitude  of  the  bending  mode  pic¬ 
tured,  then 

\  =  ■  +  1,  -1,-1,  +  1  >  }u*j 

u  (  Z )  =  A  Z  Z 
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on  the  desired  displacement  and  velocity  fields. 

A  one-point  integration  scheme  is  not  sufficient  for 
treating  spatial  variations  in  stress  with  an  element.  When 
two  integration  points  per  element  dimension  arc  used  in 
Eq .  (2.20)  for  computing  restoring  forces  of  the  medium,  con 

siderably  more  calculations  arc  required  to  process  each 
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Let  A  denote  the  amplitude  of  the  bending  mode  pictured, 
then 
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Pig.  A.l--Bending  mode  of  deformation  with  nonuniform 
stress  tensor  that  vanishes  at  the  centroid. 
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element.  However,  a  two-point  integration  scheme  is  suf¬ 
ficiently  accurate  to  treat  quadratic  variations  in  stress; 
consequently,  little  additional  computing  effort  would  he 
required  to  process  higher  order  elements. 

For  the  increased  accuracy  of  a  two-point  integra¬ 
tion  scheme  to  he  effective,  a  more  complete  description  of 
the  spatially  varying  element  stresses  is  required.  We 
note  two  possibilities:  (1)  Store  element  stress  at  each 
integration  point.  This  procedure  requires  a  considerable 
amount  of  storage.  (2)  Store  the  restoring  forces  at  node 
points  and  update  the  restoring  forces  using  stress  rates 
evaluated  at  each  integration  point.  This  procedure  is 
currently  being  installed  in  the  SWIS  code. 

Alternate  procedures  were  used  for  processing  Lamb's 
problem,  presented  in  Section  IV.  l.’e  chose  to  treat  a  point 
load  (applied  at  one  discrete  point  in  time  and  space)  as 
a  critical  test  for  the  numerical  computing  scheme  in  pre¬ 
ference  to  a  distributed  load  for  which  the  hour-glass  mode 
is  negligible.  As  a  first  effort,  we  simply  damped  the 
bending  deformations  in  each  element.  This  procedure 
succeeded  in  removing  hour-glass  patterns  from  the  velocity 
field,  houever,  hour-glass  deformations  appeared  in  the  dis¬ 
placement  field  in  the  vicinity  of  the  impulsively  loaded 
surface.  Lamb's  problem  was  then  repeated  by  including  the 
restoring  terms  that  arise  from  bending  deformations  in  the 
element.  As  expected,  no  hour-glass  patterns  appear  in 
these  results. 
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APPENDIX  B 


The  following  scheme  for  banded  sparse  matrix  multi¬ 
plies  was  developed  by  Frazier  (1972).  The  sparse  matrix 
involved  is  a  matrix  of  influence  coefficients  [A],  which 
is  a  N  x  N  matrix  of  3  *  3  submatrices.  As  noted  in  Sec¬ 
tion  3.2,  the  uncompressed  matrix  requires  roughly  109  words 
of  storage  for  a  3-D  problem  containing  N  =  1 0 4  nodes.  How¬ 
ever,  since  most  nodes  have  just  26  immediate  neighbors  in 
a  3-D  rectilinear  gridwork,  there  will  usually  be  27  non-zero 
submatrices  in  a  single  row  of 

[A]  =  A  ;  n,m  =  1,2,  ...  N  . 

^  ^  II  }  Jll 

The  unnecessary  zeroes  are  compressed  out  of  [A]  to  yield 
a  N  x  27  matrix  of  3  x  3  submatrices.  From  the  node  number¬ 
ing  sequence  we  can  deduce  the  column  numbers  for  the  non¬ 
zero  terms  in  each  row,  i.e., 


m  =  mn  k  >  n  =  1 > 2 ,  .  .  .  N 

k  =  1,2,  ...  =  27 


(B.l) 


where  n  and  m  are  the  row  and  column  numbers  of  [A] , 

respectively .  and  N  is  the  total  number  of  node  points  in 

the  3-D  grid.  The  array  of  contributing  column  numbers  m  ,  , 

n  ,  k 

k  =  1,2,  ...  *  27  are  simply  the  node  numbers  adjacent  to 
node  n. 

Only  the  non-zero  multiplications  of  the  matrix  multiply 


B 

~n 


(B.2) 


are  performed  in  the  sparse  matrix  multiply,  which  can  now 
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be  expressed  as 


for  n  =  1,2,  ...  N  with 


A 

*  n ,  k 


(B .  3) 


The  compressed  matrix  [A]  should  be  arranged  on  the 
disk  so  that  each  term  arrives  in  the  processor  containing 
the  nodal  displacement  for  which  it  is  to  be  multiplied, 

Eq.  (B . 3) .  Thus,  An  ^  should  arrive  in  the  PEM  contain¬ 
ing  U  .  ,  .  . 

without  requiring  additional  shift  operations. 

If  [A]  remains  unchanged  over  many  multiply  operations, 
considerable  effort  can  be  devoted  to  arranging  the  non-zero 
terms  of  the  matrix  on  mass  storage  in  an  optimal  fashion 
for  processing. 


In  the  case  of  the  3-D  grid,  the  vectors  U  are 
nodal  displacements  of  three  components  and  require  three 
numbers  for  their  representation.  Let  U  be  the  coinpoi 
Hm  al°n8  the  x^  axis,  i  =  1,2,3.  Correspondingly, 
n,k  is  a  3  x  3  matrix  that  requires  nine  numbers  in  its 
.  epresentat ion .  Let  A ^  denote  the  ij  element  of 

this  matrix.  Then  Eq.  (B.3)  can  be  written 
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(B .  4 ) 


where  B.^  is  the  component  of  Bn  along  the 

The  nodal  displacements  are  arranged  on 
to  flow  into  the  PEM's  (denoted  p)  by  PE  rows 


no 


axis. 

the  disk 
(denoted 


r)  so  that  U  ->-  p  =  0,  r  =  0;  U  p 
U31  -  p  =  0,  r  =  2;  U12  -  p  =  1,  r  =  0; 
r  =  2;U  p  =  0 ,  r  =  3 ;  etc. 

'1)65 

In  general  we  have 


=  0,  r  =  1; 

.  .  .  U  -*■  p  =  63 , 

-  3  >  6  4 


Ujm  *  P(m-l)  -  (m-1)  -  64[g^i]  , 


r  =  3 


m 


+  j-i 


(B .  5) 


where  denotes  fixed  point  division.  Thus,  p(m)  in 

the  above  equation  is  the  remainder  of  jp-.  This  storage  con¬ 


figuration  in  the  PE’s  is  achieved  by  loading  IJ.  ,  m  =  1,2, 

°  j  m 

...  N,  on  the  disk  in  the  sequential  order  Og,  S  =  1,2,  ... 
where 


S  =  m  +  128  jr-^i  +  64  (j 
j  _  1 1  2 , 3  . 


(j-D  , 


(B.6) 


The  condition  for  A.  to  arrive  in  the  processor 

in ,  j  k 

containing  U.  is  expressed,  using  Eq.  (B.5),  as 

J  n ,  k 


in,  jk 


p  (“n.k'1’ 


(B.7) 


The  PE  row  number  r  to  be  occupied  by  the  various  non¬ 
zero  terms  of  the  sparse  matrix  is  somewhat  more  difficult  to 
express  because  of  the  arbitrariness  of  the  node  numbering 
scheme.  The  node  numbers  n  should  increase  monotonically 
(but  not  necessarily  sequentially)  with  increasing  row  number 
r  in  each  PE  so  that  the  row  number  n  of  the  sparse  matrix 
can  be  processed  in  ascending  order.  However,  in  general,  no 
more  than  27  of  the  64  PE’s  will  contain  a  for  which 

Ain  jk  is  non"zero  for  any  particular  matrix  r8vtknumber  n. 
That  is,  only  about  one-third  of  the  PE's  will  contain  nodal 
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displacements  that  are  adjacent  to  node  number  n  in  the 
spatially  zoned  continuum.  Furthermore,  a  single  PE  may,  in 
some  instances,  contain  more  than  one  neighbor  nodal  displace- 
ment  but  rarely  more  than  nine. 

When  performing  the  multiplications  that  contribute  to 
matrix  ro\s  number  n,  there  is  no  need  to  make  the  noncontri¬ 
buting  PE's  inoperative.  Each  noncontributing  PE  can  simul¬ 
taneously  per f o rm  multiplications  for  the  next  higher  matrix 
rou  number  for  which  the  PE  will  have  a  contribution.  If  the 
compressed  matrix  [A]  is  loaded  into  the  PE's  in  any  proper 
sequence,  this  work-ahead  scheme  can  be  carried  out  by  per¬ 
forming  multiplications  in  each  PE  in  the  sequence  that  the 
[A]  terms  are  loaded  from  mass  storage.  This  desired  storage 
configuration  in  the  various  PE's  is  achieved  by  loading 

^in,jk’  n  =  • L'  N;  k  =  1,2  ...  ~ 2 7 ,  on  the  disk  in  the 

sequential  order  Ag ,  S  =  1 , 2 ,  ...  a  10  x  27  x  n  where 

S  =  64r  +  (p+1)  +  64(3i+j-4)  ,  (B.8) 

i  and  j  =1,2,3, 

in  which  r  -  0,  1,2  ...  s  10  x  27  x  n/64  and  p  =  0,  1,2  ...  63 
are  the  row  and  PE  number,  respectively.  The  PE  number  is 

P<-mn,k'1^  The  row  number  ft>r  each  PE  is  expressed  by  summing 
all  previous  entries  in  the  particular  PE,  i.e.,  the  row  num¬ 
ber  for  PEp  is  expressed  in  terms  of  n  and  k  by 
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where  «pp'  =  0  for  p  /  p'  and  6pp,  =  1  for  p  =  p'  and 
where,  just  as  in  Eq.  (B.7) 
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CB. 10) 


I  very  10th  term  in  the  array  Ag  starting  with  S  =  1  is 
used  to  store  two  index  numbers-  m  =  mn  ^  t0  identify  the 
nodal  displacement  that  is  to  be  multiplied  and  n  to 
identify  the  matrix  row  to  which  the  multiplication  contri¬ 
butes,  Eq.  (B . 3 ) . 

Using  the  storage  schemes  defined  above,  i.e.,  start¬ 
ing  from  a  point  in  the  computations  in  which  { U >  and  [A] 
appear  in  their  prescribed  sequences  on  the  disk,  the  sparse 
matrix  multiplication  of  Eq.  (B.3)  proceeds  as  follows: 

1-  {U}  is  loaded  into  core  by  PL  rows  using  a 

single  access. 

2.  The  serially  arranged  version  of  [A],  defined 
Eqs.  (B.7)  -  (B.10),  is  accessed  and  its  load¬ 
ing  is  initiated.  The  terms  flow  into  core  by 
PE  rows  starting  with  row  0.  After  30  rows  are 
filled  the  loading  is  continued,  uninterrupted, 
back  at  row  0  overwriting  the  previously  loaded 
terms.  The  computations  and  manipulations  of 
the  following  steps  are  carreid  out  before  the 
matrix  term?  are  overwritten. 

3.  Initialize  r  =  -10;  r  =  0;  B  =  0  for  r  =  0 

o  2  r  y 

1,2,  ...  99;  n  =  1. 

o 

4.  Three  sets  of  three  multiplications  and  product 
summation  arc  performed  simultaneously  in  all 
processors . 
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(r  +10)  / 1  - 6  ) 

o  \  2o, r  / 


r  =  I  m~ 1 

ri  L^“ 


( r  ,  if  n  =  B 
r  =  2  r2 

( r  +4 ,  if  n  >  B 


n  =  A  (first  32  bits) 
o 

m  =  A  (second  32  bits) 

*  A 


Also,  store  matrix  row  number  of  the  product 
contribution 


Set  n  =  n  +1 
o  o 

5.  Check  for  the  completion  of  matrix  rou  number  n 

If  n^  =  MIN’ (n)  return  to  step  (4);  otherwise, 

i.e.,  n  ^  <  MIN (n)  continue  on  to  step  (6). 

MIN (n)  is  the  minimum  b  among  all  64  PE's. 

o 

6.  Perform  a  row  sum  on  B^  i  =  1,2,3  among  those 
PE's  for  which  b^  -  n^.  With  only  PEp  operative, 
shift  the  result  to  B^  +i.j,  i  =  1,2,3  where 

i 

P  =  P (n  -1) 

0 

In  -11 

r  =  0 _ 


Operating  only  those  PE's  for  which  b  = 
Br  =  Br+4  lor  r  =  0,  1,  ...  99  . 
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If  the  next  » en  PE  rows  of  the  [ A 1  matrix  have 
been  loaded,  return  to  step  (4);  otherwise  return 
to  step  '5). 

The  computations  and  manipulations  of  steps  (4)  —  (7) 
displayed  in  Table  A.l. 


Cm 


TABLi:  A.l  COMPUTATIONS  AND  MANIPULATIONS  OF 
STFPS  (4)  -  (7) 
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